Commit graph

42 commits

Author SHA1 Message Date
Amog Kamsetty
ea6d53dbf3
[CI/AIR] Cleanup Deprecated ml_utils (#28278)
ray.util.ml_utils was deprecated in Ray 2.0. This PR does some final cleanup of our CI pipeline

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-09-06 08:39:21 +01:00
Kai Fricke
57484b28cf
[ci/air] Only run examples that need credentials in branch builds (#28260) 2022-09-02 09:10:36 -07:00
Kai Fricke
5d31f2d4bc
[tune] Run SigOpt tests in CI (#28225) 2022-09-02 09:10:01 -07:00
matthewdeng
01d473b355
[ci] print out test environment info for all python tests (#27312)
Recently there have been a number of CI test failures due to direct or transitive dependency version upgrades. Printing out environment information for each test suite allows us to quickly check the diff between failed and successful runs.

**Notes:**
1. In this PR I just manually added `./ci/env/env_info.sh` to each test suite. We may want to generalize this in the future.
2. This is just for CI now, but is applicable to release tests as well.

Signed-off-by: Matthew Deng <matt@anyscale.com>
2022-08-01 09:55:13 +01:00
Amog Kamsetty
862d10c162
[AIR] Remove ML code from ray.util (#27005)
Removes all ML related code from `ray.util`

Removes:
- `ray.util.xgboost`
- `ray.util.lightgbm`
- `ray.util.horovod`
- `ray.util.ray_lightning`

Moves `ray.util.ml_utils` to other locations

Closes #23900

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 14:24:19 +01:00
Antoni Baum
9b2cd29511
[CI] Install Horovod in doc tests to fix notebook (#26476)
Fixes the Horovod notebook example as found in #26410 by installing Horovod in doc tests jobs.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 16:27:20 +01:00
xwjiang2010
03671c961e
[CI] run air related doc/example tests as part of pre-submit CI. (#26466)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2022-07-12 18:30:37 +01:00
Sven Mika
96693055bd
[RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
Antoni Baum
5e9a8eb5f6
[AIR/data] Move preprocessors to ray.data (#25599)
Moves ray.air.Preprocessor and ray.air.preprocessors to ray.data to converge on the agreed upon package structure discussed internally.
2022-06-13 12:57:59 -07:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train (#25570) 2022-06-08 21:34:18 -07:00
Amog Kamsetty
80ae651f25
[Train] Clean up ray.train package (#25566) 2022-06-08 10:22:36 -07:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346) 2022-06-02 16:47:05 +02:00
Amog Kamsetty
983d8b3db2
[AIR] Fix failing CI on master (#25201)
The AIR CI build has been failing on master since #25022.

#25022 moved the tests that require credentials, but we left the bazel command in the build pipeline still. So even though all the tests are passing, the buildkite stage itself was failing since it tries run tests that require credentials, but these tests no longer exist in the directory. This is only a problem for master build since we don't run this command for PR builds.
2022-05-26 11:34:57 +02:00
Antoni Baum
2b6c6301e2
[CI] Fix typo in CI label (#25185) 2022-05-25 17:31:29 +02:00
Kai Fricke
d57ba750f5
[docs/air] Move upload example to docs (#25022) 2022-05-21 12:16:33 -07:00
Kai Fricke
e76efffec6
[air/docs] Move RL examples to docs (#24962)
Following #24959, this PR moves the RL examples (online/offline/serving) into the Ray ML docs. It also splits the online and offline parts.
2022-05-20 14:55:01 +01:00
Antoni Baum
1d5e6d908d
[AIR] HuggingFace Text Classification example (#24402) 2022-05-18 09:35:12 -07:00
Antoni Baum
c74886a55e
[CI] Run doc notebooks in CI (#24816)
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files

This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
2022-05-17 09:50:42 +01:00
Edward Oakes
f99aa5cb40
[serve][docs] Unify doc_code directories and add bazel target (#24736)
Split off from https://github.com/ray-project/ray/pull/24693/, unifying the redundant directories we had and making sure all `serve/doc_code` snippets are run in CI.
2022-05-16 09:49:42 -05:00
Kai Fricke
b0fa9d6766
[air] Example for Comet ML (#24603)
After #24459, this PR will add similar support for model artifact saving and an example for experiment tracking with Ray AIR for Comet ML.
2022-05-12 12:12:30 +01:00
Kai Fricke
5d9bf4234a
[air] Example to track runs with Weights & Biases (#24459)
This PR 
- adds an example on how to run Ray Train and log results to weights & biases
- adds functionality to the W&B plugin to store checkpoints
- fixes a bug introduced in #24017
- Adds a CI utility script to setup credentials
- Adds a CI utility script to remove test state from external services cc @simon-mo
2022-05-06 15:52:37 +01:00
Amog Kamsetty
ae9c68e75f
[Train] Fully deprecate Ray SGD v1 (#24038)
Ray SGD v1 has been denoted as a deprecated API for a while. This PR fully deprecates Ray SGD v1. An error will be raised if ray.util.sgd package is attempted to be imported.

Closes #16435
2022-04-25 16:12:57 -07:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739) 2022-04-08 15:33:28 +02:00
xwjiang2010
6443f3da84
[air] Add horovod trainer (#23437) 2022-03-29 18:12:32 -07:00
Amog Kamsetty
bb4ff42eec
[ml] TorchTrainer bug fixes + GPU test (#23293)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-03-17 23:49:42 -07:00
matthewdeng
6b0169b23d
[ml] enable CI tests (#22926)
Follow-up to #22748, enabling tests in CI.

Conditions: A new RAY_CI_ML_AFFECTED condition is added for this test suite. The package currently depends on Ray Data, and will be triggered accordingly.

Dependencies: Adding DATA_PROCESSING_TESTING dependencies (set for install-dependencies.sh) for now.
2022-03-09 14:31:53 +00:00
Sven Mika
e50bd212a1
[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686) 2022-03-01 16:44:17 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544) 2022-02-25 21:58:16 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
Eric Liang
92550500bc
Split workflow and dataset tests (#22415) 2022-02-16 01:47:55 -08:00
matthewdeng
2c204a755b
[train] add minimal installation test suite (#22300)
Adding a minimal test suite to catch any regressions from accidentally adding backend imports (e.g. `torch`, `tensorflow`, `horovod`) to the main import path.

**Example:** If I'm running Ray Train with `tensorflow`, I should not be required to have `torch` installed.
2022-02-11 10:09:00 -08:00
Avnish Narayan
0d2ba41e41
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685) 2022-02-04 14:59:56 +01:00
Kai Fricke
a3442df584
[ci/multinode] Build multinode image with OpenSSH before running tests (#21544)
Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.
2022-01-13 08:47:04 -08:00
Kai Fricke
5a7f6e4fdd
[rfc][ci] create fake docker-compose cluster environment (#20256)
Following #18987 this PR adds a docker-compose based local multi node cluster.

The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements.

This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.
2022-01-11 04:35:36 +00:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Eric Liang
6f93ea437e
Remove the flaky test tag (#21006) 2021-12-11 01:03:17 -08:00
Kai Fricke
97ec2a03b6
[ci/buildkite] Add ml pipeline to speed up ML/RLLib tests (#20895)
ML tests will be built in a separate bootstrap step installing all required dependencies.
2021-12-09 21:14:10 +00:00