hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
Clark Zinzow	df124d0ad5	[AIR - Datasets] Hide tensor extension from UDFs. (#27019 ) We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues: 1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases: a. Heterogeneous-shaped (i.e. ragged) tensors b. Struct arrays 2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails. Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column. 3. The tensor extension cast fallback warning spams the logs. This PR: 1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3). 2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1). 3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc.	2022-07-28 10:37:45 -07:00
Kai Fricke	3924a4b7cc	[air/train] Rename BaseWorkerMixin, only log info torch loop for rank 0 (#27098 ) This PR - only prints train_loop info strings (e.g. `train_loop_utils.py:298 -- Moving model to device: cpu`) for rank 0 workers for torch - renames `BaseWorkerMixin` to `RayTrainWorker` as the name comes up often in output and is more meaningful Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-07-27 20:11:59 +01:00
Jiao	5315f1e643	[AIR] Enable other notebooks previously marked with # REGRESSION (#26896 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-25 13:40:21 -07:00
matthewdeng	df638b3f0f	[Datasets] Automatically cast tensor columns when building Pandas blocks. (#26924 ) This PR just applies the changes from the following PRs: [Datasets] Automatically cast tensor columns when building Pandas blocks. #26684 reverted by Revert "[Datasets] Automatically cast tensor columns when building Pandas blocks." #26921 [AIR - Datasets] Fix TensorDtype construction from string and fix example. #26904 This fixes the test failures introduced in the originally reverted PRs.	2022-07-25 12:12:10 -07:00
Jiao	db027d86af	[P0][AIR] Fix train to serve notebooks (#26821 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-07-21 18:04:13 -07:00
Sumanth Ratna	759966781f	[air] Allow users to use instances of `ScalingConfig` (#25712 ) Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-18 15:46:58 -07:00
Amog Kamsetty	6595bd6e2d	[AIR] Introduce better scoring API for `BatchPredictor` (#26451 ) Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com> As discussed offline, allow configurability for feature columns and keep columns in BatchPredictor for better scoring UX on test datasets.	2022-07-14 11:26:12 -07:00
Antoni Baum	ea94cda1f3	[AIR] Replace `train.` with `session.` (#26303 ) This PR replaces legacy API calls to `train.` with AIR `session.` in Train code, examples and docs. Depends on https://github.com/ray-project/ray/pull/25735	2022-07-07 16:29:04 -07:00
Simon Mo	88a219c7f2	Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (#26231 )	2022-07-05 13:26:49 -07:00
Stephanie Wang	c9be251b7a	Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962 )" (#26176 ) This reverts commit `68692b3464`.	2022-06-28 17:07:07 -07:00
Simon Mo	68692b3464	[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962 )	2022-06-28 10:26:10 -07:00
Antoni Baum	91dd360f9d	[AIR/train] Move predictors to `ray.train` (#25769 )	2022-06-15 17:02:15 -07:00
Antoni Baum	5e9a8eb5f6	[AIR/data] Move preprocessors to `ray.data` (#25599 ) Moves ray.air.Preprocessor and ray.air.preprocessors to ray.data to converge on the agreed upon package structure discussed internally.	2022-06-13 12:57:59 -07:00
Amog Kamsetty	1316a2d05e	[AIR/Train] Move `ray.air.train` to `ray.train` (#25570 )	2022-06-08 21:34:18 -07:00
Kai Fricke	4b9a89ad90	[air] Move python/ray/ml to python/ray/air (#25449 ) The package "ml" should be renamed to "air". Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility? I'd go for no to force people to use the new structure.	2022-06-03 21:53:44 +01:00
Amog Kamsetty	e8440cf52b	[AIR] Incremental Learning Example (#24420 ) Example for domain incremental learning on Permuted MNIST Dataset with naive strategy	2022-05-26 12:28:28 -07:00

16 commits