ray/doc/source/ray-air/examples
Clark Zinzow df124d0ad5
[AIR - Datasets] Hide tensor extension from UDFs. (#27019)
We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues:

1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases:
a. Heterogeneous-shaped (i.e. ragged) tensors
b. Struct arrays
2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails.
Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column.
3. The tensor extension cast fallback warning spams the logs.

This PR:

1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3).
2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1).
3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc.
2022-07-28 10:37:45 -07:00
..
analyze_tuning_results.ipynb [docs] Improve AIR table of contents titles (#26858) 2022-07-22 17:17:49 -07:00
BUILD [AIR] Enable other notebooks previously marked with # REGRESSION (#26896) 2022-07-25 13:40:21 -07:00
convert_existing_pytorch_code_to_ray_air.ipynb [AIR - Datasets] Hide tensor extension from UDFs. (#27019) 2022-07-28 10:37:45 -07:00
feast_example.ipynb [air] Allow users to use instances of ScalingConfig (#25712) 2022-07-18 15:46:58 -07:00
huggingface_text_classification.ipynb [air/train] Rename BaseWorkerMixin, only log info torch loop for rank 0 (#27098) 2022-07-27 20:11:59 +01:00
index.rst [Docs] Small fix to AIR examples descriptions (#26227) 2022-07-05 17:16:56 -07:00
lightgbm_example.ipynb [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
pytorch_tabular_batch_prediction.py [AIR] Add framework-specific checkpoints (#26777) 2022-07-20 19:33:27 -07:00
pytorch_tabular_starter.py [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
rl_offline_example.ipynb [air] update offline/online rl example and enable them. (#26786) 2022-07-20 14:06:03 -07:00
rl_online_example.ipynb [air] update offline/online rl example and enable them. (#26786) 2022-07-20 14:06:03 -07:00
rl_serving_example.ipynb [air] Allow users to use instances of ScalingConfig (#25712) 2022-07-18 15:46:58 -07:00
serving_guide.ipynb [docs] Improve AIR table of contents titles (#26858) 2022-07-22 17:17:49 -07:00
sklearn_example.ipynb [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
tf_tabular_batch_prediction.py [AIR] Add framework-specific checkpoints (#26777) 2022-07-20 19:33:27 -07:00
tf_tabular_starter.py [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
tfx_tabular_train_to_serve.ipynb [AIR] Rename limit parameter as max_categories (#26977) 2022-07-26 10:10:40 -07:00
torch_image_batch_pretrained.py [AIR - Datasets] Hide tensor extension from UDFs. (#27019) 2022-07-28 10:37:45 -07:00
torch_image_example.ipynb [air] Allow users to use instances of ScalingConfig (#25712) 2022-07-18 15:46:58 -07:00
torch_incremental_learning.ipynb [AIR - Datasets] Hide tensor extension from UDFs. (#27019) 2022-07-28 10:37:45 -07:00
upload_to_comet_ml.ipynb [air] Allow users to use instances of ScalingConfig (#25712) 2022-07-18 15:46:58 -07:00
upload_to_wandb.ipynb [air] Allow users to use instances of ScalingConfig (#25712) 2022-07-18 15:46:58 -07:00
xgboost_batch_prediction.py [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
xgboost_example.ipynb [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
xgboost_starter.py [air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00