Commit graph

708 commits

Author SHA1 Message Date
mwtian
cf6a54ca46
[CI] pin pytest-asyncio (#21579) 2022-01-13 11:35:30 -08:00
Kai Fricke
a3442df584
[ci/multinode] Build multinode image with OpenSSH before running tests (#21544)
Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.
2022-01-13 08:47:04 -08:00
Kai Fricke
5a7f6e4fdd
[rfc][ci] create fake docker-compose cluster environment (#20256)
Following #18987 this PR adds a docker-compose based local multi node cluster.

The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements.

This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.
2022-01-11 04:35:36 +00:00
Matti Picus
f3dcd1fac1
WINDOWS: re-enable runtime_env tests, skip cluster tests in serve (#21398)
After enabling tests of test_runtime_env_plugin and test_runtime_env_env_vars (PR #21252) and python/ray/serve:* tests (PR #21107), the analysis at flaky-tests.ray.io starting showing failing tests in the windows://python/ray/test/serv:test_standalone. PR #21352 reverted 21252 (runtime_env tests), but the problem was more likely in the serve tests. Specifically  `test_standalone` has a test that uses Cluster, which should be skipped on windows because it is flaky. So this PR
- re-enables the runtime_env tests for windows
- skips the Cluster test in serve/tests/test_standalone.py
2022-01-06 21:43:58 -08:00
Archit Kulkarni
fd02065ce5
[CI] [docker] Fix docker image name regex matching (#21409) 2022-01-05 18:59:10 -08:00
Ian Rodney
1b42a49e71
[CI] [Docker Build] Allow Branches with Double digits in regex matching(#21401) 2022-01-05 14:19:19 -08:00
mwtian
24da654d90
[Test] Shard "Small & Large" tests (#21351) 2022-01-05 10:49:14 -08:00
Kai Fricke
94242e3e6e
[ci/repro] Add SYS_PTRACE to docker container, use unique name (#21377)
This will start repro docker containers with SYS_PTRACE capabilities to enable debugging e.g. via py-spy.
Additionally, default instance name tags for instance re-use will be generated using the buildkite build id and job id.
2022-01-04 16:59:12 +00:00
Archit Kulkarni
4581baa7dc
Revert "WINDOWS: unskip passing runtime_env tests (#21252)" (#21352)
This reverts commit fcb952e1bc.
2022-01-03 11:07:17 -08:00
Balaji Veeramani
43a9e95dc0
[CI] Add support for Black formatting (#21281) 2022-01-03 10:06:41 -08:00
Kai Fricke
10290eeb2f
[ci] Pin manylinux docker image (#21341) 2022-01-03 14:36:21 +00:00
Kai Fricke
14ed7cfaaa
[ci] Add repro-ci.py script to automatically setup Buildkite-runner-like instances to debug CI runs (#21292)
Create an AWS instance to reproduce Buildkite CI builds.

This script will take a Buildkite build URL as an argument and create
an AWS instance with the same properties running the same Docker container
as the original Buildkite runner. The user is then attached to this instance
and can reproduce any builds commands as if they were executed within the
runner.

This utility can be used to reproduce and debug build failures that come up
on the Buildkite runner instances but not on a local machine.
2021-12-31 10:31:50 +00:00
Matti Picus
3de18d2ada
WINDOWS: enable passing/skipping tests (#21136) 2021-12-27 11:59:00 -08:00
Matti Picus
fcb952e1bc
WINDOWS: unskip passing runtime_env tests (#21252) 2021-12-26 20:49:02 -08:00
Akash Patel
cbcd03b779
Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
Gagandeep Singh
c5c5fec22b
Unskip test_standalone from ci.sh (#21235) 2021-12-25 00:21:58 -08:00
Simon Mo
cfe0897d05
[CI] Migrate Windows tests to Buildkite (#21227) 2021-12-21 20:16:34 -08:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Simon Mo
956774e757
[CI] Disable serve test_standalone on windows again (#21154) 2021-12-17 10:32:27 -08:00
Matti Picus
29965ad325
enable passing serve tests on windows (#21107)
* enable passing serve tests on windows

* move test_handle to 'medium' and enable'

* move test_cli to 'medium'
2021-12-16 14:03:11 -08:00
Matti Picus
d2cd0730a0
[Windows] Enable test_advanced_2 on windows (#20994) 2021-12-15 14:30:40 -08:00
Ian Rodney
c7fb5a94d1
[CI] Upgrade Pip to 21.3 (#21111) 2021-12-15 13:29:45 -08:00
Edward Oakes
10947c83b3
[runtime_env] Make pip installs incremental (#20341)
Uses a direct `pip install` instead of creating a conda env to make pip installs incremental to the cluster environment.

Separates the handling of `pip` and `conda` dependencies.

The new `pip` approach still works if only the base Ray is installed on the cluster and the user specifies libraries like "ray[serve]" in the `pip` field.  The mechanism is as follows:
- We don't actually want to reinstall ray via pip, since this could lead to version mismatch issues.  Instead, we want to use the Ray that's already installed in the cluster.
- So if "ray" was included by the user in the pip list, remove it
- If a library "ray[serve]" or "ray[tune, rllib]" was included in the pip list, remove it and replace it by its dependencies (e.g. "uvicorn", "requests", ..)

Co-authored-by: architkulkarni <arkulkar@gmail.com>
Co-authored-by: architkulkarni <architkulkarni@users.noreply.github.com>
2021-12-14 15:55:18 -08:00
Matti Picus
aec04989fc
WINDOWS: enable test_advanced_3.py (#21056) 2021-12-14 09:25:23 -08:00
Eric Liang
6f93ea437e
Remove the flaky test tag (#21006) 2021-12-11 01:03:17 -08:00
Kai Fricke
97ec2a03b6
[ci/buildkite] Add ml pipeline to speed up ML/RLLib tests (#20895)
ML tests will be built in a separate bootstrap step installing all required dependencies.
2021-12-09 21:14:10 +00:00
Amog Kamsetty
611bfc1352
[ML] Move find_free_port to ml_utils (#20828)
Small refactoring of common utility used by Train, Tune, and Rllib.
2021-12-03 13:38:42 -08:00
matthewdeng
0de105d42f
[train] update Trainer._is_tune_enabled to work when Tune is not installed (#20767) 2021-11-29 20:08:51 -08:00
Guyang Song
191be85057
[script][format] check copyright for .proto files (#20632)
## Why are these changes needed?
- I found that we also have a copyright header in .proto files. Add it to the copyright formatter.
2021-11-23 12:26:30 +08:00
Simon Mo
add2450b92
[CI] [Hotfix] Skip test_standalone (#20556) 2021-11-18 16:47:18 -08:00
shrekris-anyscale
a91ddbdeb9
Add smart_open dependency to ray[default] (#20420) 2021-11-18 10:00:30 -06:00
Amog Kamsetty
4cbcb11458
[Docker] Add commit as label (#20504)
Adds the Ray commit sha as a label for the docker image.
2021-11-17 15:20:41 -08:00
Richard Liaw
1cadd61917
Fix horovod failing tests by pinning down (#20484) 2021-11-17 13:54:25 -08:00
Simon Mo
18d605fa7c
[Serve] Add experimental CLI for serve deploy (#20371) 2021-11-16 20:22:09 -08:00
Simon Mo
2dc7a6c9f8
[CI] Pin manylinux image (#20451) 2021-11-16 17:52:51 -08:00
Philipp Moritz
440da92263
Fix manylinux2014 build scripts (#20347) 2021-11-14 19:42:23 -08:00
Edward Oakes
73e570c426
Fix windows build (don't skip test_job_manager.py) (#20294) 2021-11-12 11:13:15 -08:00
Matti Picus
1e80a2a83a
[WINDOWS] unskip tests (#20212) 2021-11-12 10:11:11 -08:00
chenk008
74fa267c72
Enable worker in container CI test (#20174) 2021-11-11 16:11:06 -08:00
Teofilo Zosa
abf0eb53cc
Fix aiohttp 3.8.0 breaking changes (and unpin from 3.7) (#20261) 2021-11-11 15:35:20 -08:00
Alex Wu
d85f7f3bfa
[windows][ci] Skip test_multinode_failures_2.py (typo) (#20206) 2021-11-10 12:05:45 -08:00
architkulkarni
e5e62d8991
[runtime env] Fix runtime env conda test and enable it in CI (#20121) 2021-11-08 18:33:19 -08:00
Alex Wu
45d7ef7c08
[windows][ci] Skip test_multi_node_failure_2 (#20117) 2021-11-07 09:17:46 -08:00
Sven Mika
50c30f89c6
[Tune; RLlib] Move Tune tests that use RLlib into separate buildkite job. (#20016) 2021-11-04 20:40:57 +01:00
Jiao
6cfb52ff1d
[job submission] Add stop API + subprocess cleanup (#19860) 2021-11-04 13:59:47 -05:00
Sven Mika
4cb23d1c95
[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031) 2021-11-04 10:58:34 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
e6ae08f416
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601) 2021-11-03 10:01:34 +01:00
Simon Mo
6040319d02
[CI] Pin aiohttp version to fix master branch (#19948) 2021-11-01 23:00:08 -07:00
mwtian
7afdfdc6dd
[CI] narrow down tests that run when files change (#19656) 2021-10-29 16:47:54 -07:00