Commit graph

91 commits

Author SHA1 Message Date
Eric Liang
36c46e9686
[docs] Improve AIR table of contents titles (#26858) 2022-07-22 17:17:49 -07:00
Clark Zinzow
a29baf93c8
[Datasets] Add .iter_torch_batches() and .iter_tf_batches() APIs. (#26689)
This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.
2022-07-22 10:09:36 -07:00
Eric Liang
9272bcbbca
[docs] Add ecosystem map to AIR guide (#26859) 2022-07-21 19:06:47 -07:00
Jiao
db027d86af
[P0][AIR] Fix train to serve notebooks (#26821)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2022-07-21 18:04:13 -07:00
Jules S. Damji
6db2536971
[RayAIR] Minor tweaks to the why ray air for clarity (#26680) 2022-07-21 10:21:26 -07:00
Balaji Veeramani
ac1d21027d
[AIR] Add framework-specific checkpoints (#26777) 2022-07-20 19:33:27 -07:00
Richard Liaw
9f0d35b97c
[air/docs] add tensorflow benchmarks into table (#26800) 2022-07-20 17:12:40 -07:00
Eric Liang
d6f29eb9ca
[docs] Mark pipelined prediction as experimental for now (#26792) 2022-07-20 15:31:19 -07:00
xwjiang2010
e7957f4a3e
[air] update offline/online rl example and enable them. (#26786) 2022-07-20 14:06:03 -07:00
Richard Liaw
6563c2762d
[air] add pytorch benchmark number (#26719) 2022-07-19 09:51:13 -07:00
Richard Liaw
7e62e1187c
[air/benchmark] Torch benchmarks for 4x4 (#26692)
Add benchmark data for 4x4 GPU setup.

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-19 17:06:37 +01:00
Sumanth Ratna
759966781f
[air] Allow users to use instances of ScalingConfig (#25712)
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-07-18 15:46:58 -07:00
matthewdeng
6670708010
[air] add placement group max CPU to data benchmark (#26649)
Set experimental `_max_cpu_fraction_per_node` to prevent deadlock.

This should technically be a no-op with the SPREAD strategy.
2022-07-18 10:34:40 -07:00
Jiao
98a07920d3
[AIR][CUJ] Make distributing training benchmark at silver tier (#26640) 2022-07-17 22:07:09 -07:00
Jules S. Damji
55368402ee
added summary why and when to use bulk vs streaming data ingest (#26637) 2022-07-17 18:46:58 -07:00
Clark Zinzow
864af14f41
[Datasets] [Local Shuffle - 1/N] Add local shuffling option. (#26094)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Matthew Deng <matt@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-17 16:21:14 -07:00
Eric Liang
400330e9c0
[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634) 2022-07-16 21:55:51 -07:00
Amog Kamsetty
3a345a470c
[AIR/Docs] Add Predictor Docs (#25833) 2022-07-16 21:14:21 -07:00
Jiao
77e2ef2eb6
[AIR] Update Torch benchmarks with documentation (#26631)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-16 17:58:21 -07:00
Eric Liang
0855bcb77e
[air] Use SPREAD strategy by default and don't special case it in benchmarks (#26633) 2022-07-16 17:37:06 -07:00
Antoni Baum
fb6f3cf708
[AIR/Docs] Small improvements to Train user guide (#26577)
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-07-16 16:51:17 -07:00
Eric Liang
6217138eb0
[docs] Move AIR benchmarks to top level (#26632) 2022-07-16 15:34:31 -07:00
Richard Liaw
799311b2f7
[air/docs] update examples to remove pandas again (#26598) 2022-07-16 08:40:44 -07:00
matthewdeng
e3a096f412
[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00
Richard Liaw
5ad4e75831
[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00
Jiao
647e12b6c7
[AIR] Fix convert_existing_pytorch_code_to_ray_air notebook (#26523) 2022-07-14 14:30:55 -07:00
Amog Kamsetty
6595bd6e2d
[AIR] Introduce better scoring API for BatchPredictor (#26451)
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>

As discussed offline, allow configurability for feature columns and keep columns in BatchPredictor for better scoring UX on test datasets.
2022-07-14 11:26:12 -07:00
Richard Liaw
a0ce3c111b
[air/data] Concatenator preprocessor (#26526) 2022-07-14 10:26:14 -07:00
Eric Liang
5f18c67ba3
Fix LINT (#26554)
Signed-off-by: Eric Liang <ekhliang@gmail.com>
2022-07-13 23:28:02 -07:00
Jiao
15dbc0362a
[AIR][Docs] Fix torch_image_example (#26453) 2022-07-13 21:59:24 -07:00
Eric Liang
31c8c908f9
[docs] Improve AIR API ref organization (#26530) 2022-07-13 18:05:17 -07:00
Antoni Baum
5ed10ef921
[AIR/CI] Fix Hugging Face notebook example (#26475) 2022-07-13 09:16:42 -07:00
Eric Liang
4c04c8d92c
[doc] Rename toc entry for libraries back to "Ray Libraries" (#26485) 2022-07-12 14:23:36 -07:00
Richard Liaw
92efc85b3b
[air/docs] checkpoints (#25901) 2022-07-11 20:40:23 -07:00
Richard Liaw
1abe908c22
[air/docs] improve consistency of getting started (#26247) 2022-07-11 20:16:37 -07:00
Antoni Baum
65ea710e30
[Docs] Update Train user guide to use the new APIs (#26091) 2022-07-11 15:10:10 -07:00
Richard Liaw
5892a76a44
[air/tune] Documentation testing fixes (#26409) 2022-07-09 19:47:21 -07:00
Amog Kamsetty
cc43bcccb4
[AIR] Update TensorflowPredictor to new API (#26215)
Updates TensorflowPredictor to use the new _predict_pandas API.

Also as agreed upon offline, removes the extra configurations from TensorflowPredictor (column selection, concatenation) in favor of having this be done via a Preprocessor.
2022-07-08 13:04:49 -07:00
Antoni Baum
ea94cda1f3
[AIR] Replace train. with session. (#26303)
This PR replaces legacy API calls to `train.` with AIR `session.` in Train code, examples and docs.

Depends on https://github.com/ray-project/ray/pull/25735
2022-07-07 16:29:04 -07:00
Antoni Baum
d1966899bb
[Docs] Small fix to AIR examples descriptions (#26227) 2022-07-05 17:16:56 -07:00
Simon Mo
88a219c7f2
Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (#26231) 2022-07-05 13:26:49 -07:00
xwjiang2010
ac831fded4
[air] update documentation to use session.report (#26051)
Update documentation to use `session.report`.

Next steps:
1. Update our internal caller to use `session.report`. Most importantly, CheckpointManager and DataParallelTrainer.
2. Update `get_trial_resources` to use PGF notions to incorporate the requirement of ResourceChangingScheduler. @Yard1 
3. After 2 is done, change all `tune.get_trial_resources` to `session.get_trial_resources`
4. [internal implementation] remove special checkpoint handling logic from huggingface trainer. Optimize the flow for checkpoint conversion with `session.report`.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-30 10:37:31 -07:00
matthewdeng
4a21dc31ae
[air] update DummyTrainer to handle DatasetPipelines (#26175)
1. Update `DummyTrainer` to take `num_epochs` instead of `runtime_seconds`.
    1. Ray Train expects equal number of calls to `train.report()`. Different workers may run at different speeds and terminate after different epoch numbers, which causes an error.
2. Add `generate_epochs` to support `DatasetPipeline` when `use_stream_api` is True.
3. Update `__main__` code to support testing different configurations.
2022-06-29 09:32:57 -07:00
Antoni Baum
dc7ed086a5
[AIR] More checkpoint configurability, Result extension (#25943)
This PR:
* Allows the user to set `keep_checkpoints_num` and `checkpoint_score_attr` in `RunConfig` using the `CheckpointStrategy` dataclass
* Adds two new fields to the `Result` object - `best_checkpoints` - a list of saved best checkpoints as determined by `CheckpointingConfig`.
2022-06-29 08:23:29 -07:00
Antoni Baum
128f9e5664
[AIR] Move integration logging callbacks to AIR (#26126)
As the integration logging callbacks are commonly used with AIR Trainers, they should be moved from the tune package to the air package. The old imports will still work, but raise a deprecation warning.
2022-06-28 17:25:19 -07:00
Stephanie Wang
c9be251b7a
Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962)" (#26176)
This reverts commit 68692b3464.
2022-06-28 17:07:07 -07:00
Simon Mo
68692b3464
[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment (#25962) 2022-06-28 10:26:10 -07:00
Antoni Baum
0ec198acc2
[AIR] Remove unnecessary pandas from examples (#26009)
Removes unnecessary pandas usage from AIR examples. Helps ensure users do not follow bad practices.
2022-06-24 14:38:23 -07:00
Antoni Baum
94492c2b49
[AIR/Docs] Improve user guide gallery (#26016)
Moves logging to examples, reorders to match TOC, adds missing entry.
2022-06-23 17:51:01 -07:00
Antoni Baum
91dd360f9d
[AIR/train] Move predictors to ray.train (#25769) 2022-06-15 17:02:15 -07:00