1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-12 22:26:39 -04:00
Commit graph

32 commits

Author SHA1 Message Date
matthewdeng
113c4d7fab
[air][data] move train_test_split to ray.data.Dataset () 2022-07-27 09:53:37 -07:00
Balaji Veeramani
8bc836d9fb
[AIR] Remove CustomStatefulPreprocessor () 2022-07-26 10:10:57 -07:00
Eric Liang
008eecfbff
[docs] Update the AIR data ingest guide () 2022-07-24 09:59:29 -07:00
Clark Zinzow
a29baf93c8
[Datasets] Add .iter_torch_batches() and .iter_tf_batches() APIs. ()
This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.
2022-07-22 10:09:36 -07:00
Balaji Veeramani
ac1d21027d
[AIR] Add framework-specific checkpoints () 2022-07-20 19:33:27 -07:00
Sumanth Ratna
759966781f
[air] Allow users to use instances of ScalingConfig ()
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-07-18 15:46:58 -07:00
Clark Zinzow
864af14f41
[Datasets] [Local Shuffle - 1/N] Add local shuffling option. ()
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Matthew Deng <matt@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-17 16:21:14 -07:00
Amog Kamsetty
3a345a470c
[AIR/Docs] Add Predictor Docs () 2022-07-16 21:14:21 -07:00
Richard Liaw
799311b2f7
[air/docs] update examples to remove pandas again () 2022-07-16 08:40:44 -07:00
Richard Liaw
92efc85b3b
[air/docs] checkpoints () 2022-07-11 20:40:23 -07:00
Richard Liaw
1abe908c22
[air/docs] improve consistency of getting started () 2022-07-11 20:16:37 -07:00
Antoni Baum
ea94cda1f3
[AIR] Replace train. with session. ()
This PR replaces legacy API calls to `train.` with AIR `session.` in Train code, examples and docs.

Depends on https://github.com/ray-project/ray/pull/25735
2022-07-07 16:29:04 -07:00
Simon Mo
88a219c7f2
Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" () 2022-07-05 13:26:49 -07:00
matthewdeng
4a21dc31ae
[air] update DummyTrainer to handle DatasetPipelines ()
1. Update `DummyTrainer` to take `num_epochs` instead of `runtime_seconds`.
    1. Ray Train expects equal number of calls to `train.report()`. Different workers may run at different speeds and terminate after different epoch numbers, which causes an error.
2. Add `generate_epochs` to support `DatasetPipeline` when `use_stream_api` is True.
3. Update `__main__` code to support testing different configurations.
2022-06-29 09:32:57 -07:00
Stephanie Wang
c9be251b7a
Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment ()" ()
This reverts commit 68692b3464.
2022-06-28 17:07:07 -07:00
Simon Mo
68692b3464
[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment () 2022-06-28 10:26:10 -07:00
Antoni Baum
0ec198acc2
[AIR] Remove unnecessary pandas from examples ()
Removes unnecessary pandas usage from AIR examples. Helps ensure users do not follow bad practices.
2022-06-24 14:38:23 -07:00
Antoni Baum
91dd360f9d
[AIR/train] Move predictors to ray.train () 2022-06-15 17:02:15 -07:00
xwjiang2010
88d824d067
[air] remove fully_executed from Tune. () 2022-06-14 22:32:48 -07:00
Eric Liang
ff2cfbe351
[air] Add streaming BatchPredictor support () 2022-06-13 15:22:36 -07:00
Antoni Baum
5e9a8eb5f6
[AIR/data] Move preprocessors to ray.data ()
Moves ray.air.Preprocessor and ray.air.preprocessors to ray.data to converge on the agreed upon package structure discussed internally.
2022-06-13 12:57:59 -07:00
matthewdeng
88524d8b57
[air] add CustomStatefulPreprocessor () 2022-06-09 16:54:46 -07:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train () 2022-06-08 21:34:18 -07:00
xwjiang2010
76b34d4a03
[air] add to_air_checkpoint method for inference only workload. ()
Follow up on our last discussion for supporting piecemeal fashion air users.
Only did for tensorflow for now, want to collect some feedback on API naming, package structure etc and I will add others.
2022-06-07 14:50:39 -07:00
Eric Liang
c1afbcb6f4
[air] Enforce API stability annotations for AIR module () 2022-06-06 22:52:21 -07:00
Eric Liang
78688a0903
Enable streaming ingest in AIR ()
This adds the following options to DatasetConfig, which can be used to enable streaming ingest.

```
    # Whether the dataset should be streamed into memory using pipelined reads.
    # When enabled, get_dataset_shard() returns DatasetPipeline instead of Dataset.
    # The amount of memory to use is controlled by `stream_window_size`.
    # False by default for all datasets.
    use_stream_api: Optional[bool] = None

    # Configure the streaming window size in bytes. A typical value is something like
    # 20% of object store memory. If set to -1, then an infinite window size will be
    # used (similar to bulk ingest). This only has an effect if use_stream_api is set.
    # Set to 1.0 GiB by default.
    stream_window_size: Optional[float] = None

    # Whether to enable global shuffle (per pipeline window in streaming mode). Note
    # that this is an expensive all-to-all operation, and most likely you want to use
    # local shuffle instead.
    # False by default for all datasets.
    global_shuffle: Optional[bool] = None
```
2022-06-06 17:42:15 -07:00
Eric Liang
1f509ab331
[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest ()
This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest).

This PR adds the minimal support for dataset configs. Future PRs will:
- Add support for streaming ingest
- Move this config from DataParallelTrainer to ml.Trainer
2022-06-03 16:32:53 -07:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air ()
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
matthewdeng
2e05b62236
[AIR] Preprocessors feature guide () 2022-06-03 11:43:51 -07:00
Eric Liang
995309f9a3
[docs] Add AIR data ingest docs (part 1-- bulk loading only) () 2022-05-19 14:25:47 -07:00
Richard Liaw
41de6acd10
[air] fix-docs ()
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2022-05-13 15:58:31 -07:00
Richard Liaw
ce5a27e31b
[docs] Add initial AIR documentation ()
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-13 01:29:59 -07:00