hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 22:26:39 -04:00

Author	SHA1	Message	Date
matthewdeng	113c4d7fab	[air][data] move train_test_split to ray.data.Dataset (#27065 )	2022-07-27 09:53:37 -07:00
Balaji Veeramani	8bc836d9fb	[AIR] Remove `CustomStatefulPreprocessor` (#26981 )	2022-07-26 10:10:57 -07:00
Eric Liang	008eecfbff	[docs] Update the AIR data ingest guide (#26909 )	2022-07-24 09:59:29 -07:00
Clark Zinzow	a29baf93c8	[Datasets] Add `.iter_torch_batches()` and `.iter_tf_batches()` APIs. (#26689 ) This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.	2022-07-22 10:09:36 -07:00
Balaji Veeramani	ac1d21027d	[AIR] Add framework-specific checkpoints (#26777 )	2022-07-20 19:33:27 -07:00
Sumanth Ratna	759966781f	[air] Allow users to use instances of `ScalingConfig` (#25712 ) Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-18 15:46:58 -07:00
Clark Zinzow	864af14f41	[Datasets] [Local Shuffle - 1/N] Add local shuffling option. (#26094 ) Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Matthew Deng <matt@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-17 16:21:14 -07:00
Amog Kamsetty	3a345a470c	[AIR/Docs] Add Predictor Docs (#25833 )	2022-07-16 21:14:21 -07:00
Richard Liaw	799311b2f7	[air/docs] update examples to remove pandas again (#26598 )	2022-07-16 08:40:44 -07:00
Richard Liaw	92efc85b3b	[air/docs] checkpoints (#25901 )	2022-07-11 20:40:23 -07:00
Richard Liaw	1abe908c22	[air/docs] improve consistency of getting started (#26247 )	2022-07-11 20:16:37 -07:00
Antoni Baum	ea94cda1f3	[AIR] Replace `train.` with `session.` (#26303 ) This PR replaces legacy API calls to `train.` with AIR `session.` in Train code, examples and docs. Depends on https://github.com/ray-project/ray/pull/25735	2022-07-07 16:29:04 -07:00
Simon Mo	88a219c7f2	Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (#26231 )	2022-07-05 13:26:49 -07:00
matthewdeng	4a21dc31ae	[air] update DummyTrainer to handle DatasetPipelines (#26175 ) 1. Update `DummyTrainer` to take `num_epochs` instead of `runtime_seconds`. 1. Ray Train expects equal number of calls to `train.report()`. Different workers may run at different speeds and terminate after different epoch numbers, which causes an error. 2. Add `generate_epochs` to support `DatasetPipeline` when `use_stream_api` is True. 3. Update `__main__` code to support testing different configurations.	2022-06-29 09:32:57 -07:00
Stephanie Wang	c9be251b7a	Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962 )" (#26176 ) This reverts commit `68692b3464`.	2022-06-28 17:07:07 -07:00
Simon Mo	68692b3464	[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962 )	2022-06-28 10:26:10 -07:00
Antoni Baum	0ec198acc2	[AIR] Remove unnecessary pandas from examples (#26009 ) Removes unnecessary pandas usage from AIR examples. Helps ensure users do not follow bad practices.	2022-06-24 14:38:23 -07:00
Antoni Baum	91dd360f9d	[AIR/train] Move predictors to `ray.train` (#25769 )	2022-06-15 17:02:15 -07:00
xwjiang2010	88d824d067	[air] remove fully_executed from Tune. (#25750 )	2022-06-14 22:32:48 -07:00
Eric Liang	ff2cfbe351	[air] Add streaming BatchPredictor support (#25693 )	2022-06-13 15:22:36 -07:00
Antoni Baum	5e9a8eb5f6	[AIR/data] Move preprocessors to `ray.data` (#25599 ) Moves ray.air.Preprocessor and ray.air.preprocessors to ray.data to converge on the agreed upon package structure discussed internally.	2022-06-13 12:57:59 -07:00
matthewdeng	88524d8b57	[air] add `CustomStatefulPreprocessor` (#25497 )	2022-06-09 16:54:46 -07:00
Amog Kamsetty	1316a2d05e	[AIR/Train] Move `ray.air.train` to `ray.train` (#25570 )	2022-06-08 21:34:18 -07:00
xwjiang2010	76b34d4a03	[air] add to_air_checkpoint method for inference only workload. (#25444 ) Follow up on our last discussion for supporting piecemeal fashion air users. Only did for tensorflow for now, want to collect some feedback on API naming, package structure etc and I will add others.	2022-06-07 14:50:39 -07:00
Eric Liang	c1afbcb6f4	[air] Enforce API stability annotations for AIR module (#25485 )	2022-06-06 22:52:21 -07:00
Eric Liang	78688a0903	Enable streaming ingest in AIR (#25428 ) This adds the following options to DatasetConfig, which can be used to enable streaming ingest. ``` # Whether the dataset should be streamed into memory using pipelined reads. # When enabled, get_dataset_shard() returns DatasetPipeline instead of Dataset. # The amount of memory to use is controlled by `stream_window_size`. # False by default for all datasets. use_stream_api: Optional[bool] = None # Configure the streaming window size in bytes. A typical value is something like # 20% of object store memory. If set to -1, then an infinite window size will be # used (similar to bulk ingest). This only has an effect if use_stream_api is set. # Set to 1.0 GiB by default. stream_window_size: Optional[float] = None # Whether to enable global shuffle (per pipeline window in streaming mode). Note # that this is an expensive all-to-all operation, and most likely you want to use # local shuffle instead. # False by default for all datasets. global_shuffle: Optional[bool] = None ```	2022-06-06 17:42:15 -07:00
Eric Liang	1f509ab331	[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337 ) This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest). This PR adds the minimal support for dataset configs. Future PRs will: - Add support for streaming ingest - Move this config from DataParallelTrainer to ml.Trainer	2022-06-03 16:32:53 -07:00
Kai Fricke	4b9a89ad90	[air] Move python/ray/ml to python/ray/air (#25449 ) The package "ml" should be renamed to "air". Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility? I'd go for no to force people to use the new structure.	2022-06-03 21:53:44 +01:00
matthewdeng	2e05b62236	[AIR] Preprocessors feature guide (#25302 )	2022-06-03 11:43:51 -07:00
Eric Liang	995309f9a3	[docs] Add AIR data ingest docs (part 1-- bulk loading only) (#24799 )	2022-05-19 14:25:47 -07:00
Richard Liaw	41de6acd10	[air] fix-docs (#24792 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2022-05-13 15:58:31 -07:00
Richard Liaw	ce5a27e31b	[docs] Add initial AIR documentation (#24483 ) Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-05-13 01:29:59 -07:00

32 commits