Commit graph

28 commits

Author SHA1 Message Date
Amog Kamsetty
365fc44754
[AIR] Update to new Predictor interface (#25425)
Updates the Predictor interface to have Pandas as a narrow waist.
2022-06-06 15:41:38 -07:00
Richard Liaw
36aee6a1c4
[air/docs] Update documentation structure (#25475)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-06-06 15:15:11 -07:00
Balaji Veeramani
5e06baa77e
[AIR] Remove /Users/balaji from Torch example (#25515) 2022-06-06 13:13:54 -07:00
Eric Liang
1f509ab331
[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337)
This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest).

This PR adds the minimal support for dataset configs. Future PRs will:
- Add support for streaming ingest
- Move this config from DataParallelTrainer to ml.Trainer
2022-06-03 16:32:53 -07:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
matthewdeng
2e05b62236
[AIR] Preprocessors feature guide (#25302) 2022-06-03 11:43:51 -07:00
Eric Liang
51b295ad74
[docs] Improve Tune + Datasets documentation (#25389) 2022-06-01 21:52:32 -07:00
Balaji Veeramani
f9e7b55123
[AIR] Add Torch image example (#24618) 2022-05-27 16:47:21 -07:00
Amog Kamsetty
e8440cf52b
[AIR] Incremental Learning Example (#24420)
Example for domain incremental learning on Permuted MNIST Dataset with naive strategy
2022-05-26 12:28:28 -07:00
xwjiang2010
ff1fb9b5a2
[air example] train a Keras model on tabular data and serve it. (#24898) 2022-05-25 22:19:35 -07:00
Kai Fricke
d57ba750f5
[docs/air] Move upload example to docs (#25022) 2022-05-21 12:16:33 -07:00
Kai Fricke
e76efffec6
[air/docs] Move RL examples to docs (#24962)
Following #24959, this PR moves the RL examples (online/offline/serving) into the Ray ML docs. It also splits the online and offline parts.
2022-05-20 14:55:01 +01:00
Eric Liang
995309f9a3
[docs] Add AIR data ingest docs (part 1-- bulk loading only) (#24799) 2022-05-19 14:25:47 -07:00
Kai Fricke
9a8c8f4889
[air/docs] Move some examples from ml/examples to docs (#24959)
This moves the basic LightGBM, Sklearn, and XGBoost examples from the examples/ folder to the docs. We keep a symlink in the examples folder.
2022-05-19 14:01:49 +01:00
Antoni Baum
1d5e6d908d
[AIR] HuggingFace Text Classification example (#24402) 2022-05-18 09:35:12 -07:00
Antoni Baum
c74886a55e
[CI] Run doc notebooks in CI (#24816)
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files

This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
2022-05-17 09:50:42 +01:00
Antoni Baum
7158aeda33
[Datasets] Add Dataset.split_proportionately and ray.ml.train_test_split (#24476)
Adds a Dataset.split_proportionately method that allows the user to split a dataset using proportions. This is a very common use-case for eg. train-test splitting. The implementation is a thin wrapper over Dataset.split_at_indices.

Additionally, this PR adds a ray.ml.train_test_split function intended to provide a familiar API to ML practitioners.
2022-05-16 20:47:29 -07:00
Richard Liaw
41de6acd10
[air] fix-docs (#24792)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2022-05-13 15:58:31 -07:00
Kai Fricke
a92ce9721c
[air] Example to run tuning and analyze results (#24602)
This is a notebook showing how to tune an xgboost model and analyze the results.

Also adds a `get_dataframe()` method to `ResultsGrid` to fetch the trial results.

Depends on #24483 for toctree.
2022-05-13 15:22:36 +01:00
Kai Fricke
9e21e392ee
[air/doc] Add examples doc structure (#24770)
Add the basic toc/structure for Ray AIR examples
2022-05-13 11:56:34 +01:00
Richard Liaw
ce5a27e31b
[docs] Add initial AIR documentation (#24483)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-13 01:29:59 -07:00
Amog Kamsetty
c4bf38daa6
[AIR] Add AIR install extra (#24701)
Closes #23439
2022-05-12 09:25:52 -07:00
Amog Kamsetty
a36e2a8f51
[Tune] Deprecate DistributedTrainableCreator (#24453)
Fully deprecate DistributedTrainableCreator for Ray 2.0

Closes #24453
2022-05-10 11:06:43 -07:00
Antoni Baum
ff0ced1a64
[AIR] HuggingFaceTrainer&Predictor implementation (#23876)
Implements HuggingFaceTrainer & HuggingFacePredictor.
2022-04-29 14:31:54 -07:00
Kai Fricke
40d3a62aa1
[air/wip] Add batch predictor class (#23808)
What: This class adds a generic BatchPredictor class that offers an interface to run batch inference on Ray datasets. It takes a Predictor class and checkpoint as an input, and provides a predict(dataset) method to run scalable scoring inference.

Why: Currently users have to implement scorers themselves. This is mostly boilerplate and prone to errors, so we should provide a simple solution instead.

Note that this predictor also implements the Predictor interface.
2022-04-13 08:58:08 +01:00
Antoni Baum
40646eecd4
[AIR] SklearnTrainer & Predictor interfaces (#23803)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2022-04-11 15:11:42 -07:00
Simon Mo
cb1919b8d0
[Doc][Serve] Add minimal docs for model wrappers and http adapters (#23536) 2022-03-29 11:33:14 -07:00
Richard Liaw
1fe110f8f4
[ml] Add a starter page for docstrings (#23312) 2022-03-21 17:20:45 -07:00