Commit graph

18 commits

Author SHA1 Message Date
matthewdeng
01d473b355
[ci] print out test environment info for all python tests (#27312)
Recently there have been a number of CI test failures due to direct or transitive dependency version upgrades. Printing out environment information for each test suite allows us to quickly check the diff between failed and successful runs.

**Notes:**
1. In this PR I just manually added `./ci/env/env_info.sh` to each test suite. We may want to generalize this in the future.
2. This is just for CI now, but is applicable to release tests as well.

Signed-off-by: Matthew Deng <matt@anyscale.com>
2022-08-01 09:55:13 +01:00
Jiao
d95dc2f2e5
[AIR][GPU Batch Prediction] Add basic support for GPU batch prediction (#26251)
This PR adds GPU support for pytorch and tensorflow predictor, as well as automatic setting `use_gpu` flag in `BatchPredictor`.

Notable changes:
- Added `use_gpu` flag in the constructor of `TorchPredictor` and `TensorflowPredictor` (note it's slightly different from our latest design doc that puts this flag at `predict()` call)
- Added `use_gpu` flag to `SklearnPredictor` so its interface is compatible with `BatchPredictor`
- Code to move both model weights and input tensor to default visible GPU at index 0 if flag is set 
- parametrized existing predictor tests to use GPU for both CPU & GPU coverage
- Changed BUILD CI tests with an added `gpu` tag (I'm not 100% sure if that's a right way tho)

Follow ups:

https://github.com/ray-project/ray/issues/26249 is created in case our host has multiple GPU devices. It's a bit out of scope for this PR, but for GPU batch inference ideally we should be able to evenly use all GPU devices on host where CPU & DRAM are busy with pre-fetching + data movement to GPU. We might approximately do the same by scheduling same # of Predictor instances on the host, but that's worth verifying once benchmarks are set.
2022-07-11 13:04:15 -07:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train (#25570) 2022-06-08 21:34:18 -07:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
Antoni Baum
1d5e6d908d
[AIR] HuggingFace Text Classification example (#24402) 2022-05-18 09:35:12 -07:00
Antoni Baum
c74886a55e
[CI] Run doc notebooks in CI (#24816)
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files

This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
2022-05-17 09:50:42 +01:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
Amog Kamsetty
bb4ff42eec
[ml] TorchTrainer bug fixes + GPU test (#23293)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-03-17 23:49:42 -07:00
Kai Fricke
b51b5afaea
[ci/gpu] Move ML dependency install to Dockerfile (#21711)
Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead.
This will update GPU tests to always use Python 3.7.
2022-02-01 12:04:55 +00:00
Sven Mika
c01245763e
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339) 2022-01-04 18:30:26 +01:00
Kai Fricke
489e6945a6
Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113)" (#21338)
This reverts commit 327eb84154.
2022-01-03 10:21:25 +00:00
Benjamin Black
327eb84154
[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113) 2022-01-02 21:29:09 +01:00
Eric Liang
6f93ea437e
Remove the flaky test tag (#21006) 2021-12-11 01:03:17 -08:00
matthewdeng
0de105d42f
[train] update Trainer._is_tune_enabled to work when Tune is not installed (#20767) 2021-11-29 20:08:51 -08:00
matthewdeng
78e9ff7c91
[train][datasets] add example for big data training (#20042)
* [train][datasets] add example for big data training

* add title docstring

* lint and dependencies

* add dask_ml requirement
2021-11-05 09:28:48 -07:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Kai Fricke
3dc176c42e
[ci/tune] Add SGD and Tune GPU pipeline step to CI (#18469)
* [ci/tune] Add Tune GPU pipeline step to CI

* cont.

* add sgd gpu tests

* format yaml, fix imports

* install horovod; fix line wrapping

* set GPU per worker to 0.5

* fix import

* move test to 4gpu machine

* fix lint

* lint

* set visible devices

* pull in tf gpu fix

* Fix Tune GPU pipeline step

* nit

* Disable GPU tests until we have some

* Re-add empty rllib tests

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-10-01 18:34:05 -07:00