Commit graph

61 commits

Author SHA1 Message Date
Eric Liang
00b5801d71
Fix datasets leaking worker processes due to closure capture of stats actor handle (#22156) 2022-02-07 14:05:44 -08:00
Yi Cheng
0659d4a472
[nightly] Limit many drivers iteration to 4000 iterations (#21958)
Due to faster running of many drivers, we limit the iteration to 4k for the test.
2022-01-31 13:26:02 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
mwtian
97f7e3d0e6
[e2e] do not terminate in serve_failure smoke test (#21925)
When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.
2022-01-27 15:36:46 -08:00
SangBin Cho
6b4aac7a08
Promote unstable tests to stable (#21811)
Promote tests that have passed 100% last 1 week to stable
2022-01-24 02:10:37 -08:00
SangBin Cho
b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00
mwtian
0b3fed5ef3
Revert "[Nightly Test] Add a team column to each test config. (#21198)" (#21289)
This reverts commit b5b11b2d06.
2021-12-30 06:44:51 +09:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
architkulkarni
2489b17634
[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175)
* uninstall old ray in all release test app configs

* add instruction to e2e.py dosctring
2021-12-18 16:58:49 -08:00
architkulkarni
56bd8e58de
[CI] [Release] uninstall Ray before installing new Ray version (#21159) 2021-12-17 16:25:15 -08:00
Kai Fricke
b58f839534
[ci/release] Remove hard numpy removal from app configs (#21005) 2021-12-13 15:22:02 +00:00
SangBin Cho
6649f078e5
[Internal Observability] Move debug_state.txt to the log dir + support gcs_server debug state (#20722)
Moving debug_state.txt to the log directory. This will help us finding debug_state.txt from the dashboard. See below.
Add debug_state_gcs.txt. This will display GCS' debug state. GCS will also dump debug state to the file every 10 seconds
For periodic printing of debug state, I made it happen every 1 minute. This is because every 10 seconds usually is very spammy.
2021-11-28 20:42:37 -08:00
Amog Kamsetty
ac843a957c
[Release] Use large instance type for long running impala test (#20691)
* add

* update
2021-11-26 11:42:41 -08:00
Yi Cheng
b6b4d4cf57
[test] Update base image for nightly testing (#20680)
## Why are these changes needed?

`base_image: "anyscale/ray-ml:pinned-nightly-py37"` doesn't exist anymore which fails a lot of nightly tests, change to `base_image: "anyscale/ray-ml:nightly-py37-gpu"`
## Related issue number

## Checks
2021-11-23 11:06:44 -08:00
Alex Wu
a811b2b6d7
[hotfix] Fix stress_test_many_tasks cluster environment (#20519)
This should fix the long running release tests that are failing to build their app configs.

It seems like pip install ray[all] now downgrades the ray version. It's unclear why, but most likely, a dependency has pinned the ray version now. This PR explicitely install the version of Ray that we want after the pip install ray[all] to fix the problem.
2021-11-18 11:51:46 -08:00
Amog Kamsetty
3f1092fb3d
[Release] Revert impala app config (#20397) 2021-11-18 11:24:22 -08:00
Simon Mo
ca90c63483
[Serve] Add serve failure test to CI (#20392) 2021-11-16 08:12:08 -08:00
gjoliver
7fe42341ed
[release] Switch many_ppo test to use the canonical rllib app cfg as well. (#20310) 2021-11-12 20:51:28 -08:00
Edward Oakes
7c9881b73d
[serve] Fix serve_failure test (#20268) 2021-11-11 19:19:34 -08:00
Amog Kamsetty
18dcf1ac25
[Release] Use nightly Docker images (#20001)
* use nightly

* switch ml cpu to ray cpu

* fix

* add pytest

* add more pytest

* add constraint

* add tensorflow

* fix merge conflict

* add tblib

* fix

* add back uninstall
2021-11-10 18:00:16 -08:00
xwjiang2010
2fbbecf1e4
[release] Define worker node type even if no worker node is needed. (#20223) 2021-11-10 11:19:09 -08:00
xwjiang2010
99826d2ca6
[Release] Increase node memory by 2X in many_ppo test. (#19591) 2021-11-08 08:10:09 +09:00
gjoliver
1341bb59bf
[RLlib; Release testing] long_running_tests should use RLlib's app_config. (#20095) 2021-11-05 15:18:56 +01:00
Yi Cheng
04f60c998e
[nightly] Fix pytest missing in nightly test (#20076)
## Why are these changes needed?
In the nightly test we see
```
Command returned non-success status: 1; Command logs:Traceback (most recent call last): File "dask_on_ray/large_scale_test.py", line 17, in from ray._private.test_utils import monitor_memory_usage File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/test_utils.py", line 18, in import pytest ModuleNotFoundError: No module named 'pytest'
```
This PR fixes this error.

## Related issue number
2021-11-04 13:38:05 -07:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
SangBin Cho
bcd27b708f
[Test] Mark many ppo as unstable (#19769) 2021-10-26 21:27:43 -07:00
Jiao
aaef82920d
[serve] Add periodic timeouts to long poll client to avoid accumulating concurrent tasks in the controller (#19728) 2021-10-26 09:44:00 -05:00
SangBin Cho
9000f41aa6
[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539)
* random fixes

* Done

* done

* update the doc

* doc lint fix

* .

* .
2021-10-21 07:37:05 -07:00
Kai Fricke
e07d0953ea
[ci/release] Undo faulty change to many_ppo num_samples (#19388) 2021-10-14 10:27:31 -07:00
Kai Fricke
9cee83c919
[tune] PBT: Add burn-in period (#19321) 2021-10-14 16:28:29 +01:00
Jiajun Yao
2b44e9a3e1
Increase disk for long running tests (#19064) 2021-10-03 22:52:44 -07:00
Jiajun Yao
18bdde1918
Install the test wheel last (#18881) 2021-09-24 20:56:53 +01:00
Kai Fricke
d52203ee03
[ci/release] Fix long running serve test result fetching (#18880) 2021-09-24 16:16:01 +01:00
Kai Fricke
15a83d104d
[ci/release] remove legacy release tests (#18592) 2021-09-15 14:42:58 +01:00
Kai Fricke
7d1e6d3129
[ci/release] Add sanity check for ray wheels hash to release tests (#18489) 2021-09-10 17:50:31 +01:00
Kai Fricke
21d90a0e9a
Increase disk for serve tests (#17606) 2021-08-19 17:51:19 +02:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. (#17885)
* gcs_utils

* resource_spec

* profiling

* ray_perf and ray_cluster_perf

* test_utils
2021-08-18 20:56:33 -07:00
Kai Fricke
8580e450cb
[release] update/unify base images (#17859) 2021-08-16 12:44:25 +02:00
architkulkarni
8c1317067d
move variable updates from middle of loop to end (#17591) 2021-08-05 09:53:01 +01:00
Jiao
f4f702c595
[Release] change default expiration to 2 days in order to prevent custodian kill it early morning (#17215)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-20 17:03:14 -07:00
Jiao
6aeda62d40
[Serve] Add serve test config files and wrk dependency (#16631) 2021-06-28 10:01:55 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests (#16355) 2021-06-16 21:37:17 +01:00
Kai Fricke
153a8b8fec
[release] convert tune release tests (#15913) 2021-06-01 11:19:15 -07:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates (#15366)
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Amog Kamsetty
ebc44c3d76
[CI] Upgrade flake8 to 3.9.1 (#15527)
* formatting

* format util

* format release

* format rllib/agents

* format rllib/env

* format rllib/execution

* format rllib/evaluation

* format rllib/examples

* format rllib/policy

* format rllib utils and tests

* format streaming

* more formatting

* update requirements files

* fix rllib type checking

* updates

* update

* fix circular import

* Update python/ray/tests/test_runtime_env.py

* noqa
2021-05-03 14:23:28 -07:00
Edward Oakes
0f9d1bb223
Serve failure release test fix (#15276)
This test is currently not tested in CI
2021-04-13 17:49:29 +01:00
Edward Oakes
e4ca337e16
[serve] Change remaining tests to use deployment API (#15167) 2021-04-08 08:15:38 -05:00
Simon Mo
c963cbc038
Fix Docker Permission for Serve release test again (#13543) 2021-01-19 12:23:30 -08:00
SangBin Cho
1179db1fc2
Remove an unnecessary file (#13499) 2021-01-15 18:29:12 -08:00
SangBin Cho
d09df55b14
Update ID specification doc (#13356) 2021-01-15 15:15:51 -08:00