hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	00b5801d71	Fix datasets leaking worker processes due to closure capture of stats actor handle (#22156 )	2022-02-07 14:05:44 -08:00
Yi Cheng	0659d4a472	[nightly] Limit many drivers iteration to 4000 iterations (#21958 ) Due to faster running of many drivers, we limit the iteration to 4k for the test.	2022-01-31 13:26:02 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
mwtian	97f7e3d0e6	[e2e] do not terminate in `serve_failure` smoke test (#21925 ) When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.	2022-01-27 15:36:46 -08:00
SangBin Cho	6b4aac7a08	Promote unstable tests to stable (#21811 ) Promote tests that have passed 100% last 1 week to stable	2022-01-24 02:10:37 -08:00
SangBin Cho	b1308b1c8c	[Test Infra] Unrevert team col (#21700 ) This fixes the previous problems from team column revert. This has 2 additional changes; alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289 Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time	2022-01-19 13:29:53 -08:00
mwtian	0b3fed5ef3	Revert "[Nightly Test] Add a team column to each test config. (#21198 )" (#21289 ) This reverts commit `b5b11b2d06`.	2021-12-30 06:44:51 +09:00
SangBin Cho	b5b11b2d06	[Nightly Test] Add a team column to each test config. (#21198 ) Please review e2e.py and test_suite belonging to your team! This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit# This PR adds a team name to each test suite. If the name is not specified, it will be reported as unspecified. If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future). Note that we will aggregate all of test config into a single file, nightly_test.yaml.	2021-12-27 14:42:41 -08:00
architkulkarni	2489b17634	[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175 ) * uninstall old ray in all release test app configs * add instruction to e2e.py dosctring	2021-12-18 16:58:49 -08:00
architkulkarni	56bd8e58de	[CI] [Release] uninstall Ray before installing new Ray version (#21159 )	2021-12-17 16:25:15 -08:00
Kai Fricke	b58f839534	[ci/release] Remove hard numpy removal from app configs (#21005 )	2021-12-13 15:22:02 +00:00
SangBin Cho	6649f078e5	[Internal Observability] Move debug_state.txt to the log dir + support gcs_server debug state (#20722 ) Moving debug_state.txt to the log directory. This will help us finding debug_state.txt from the dashboard. See below. Add debug_state_gcs.txt. This will display GCS' debug state. GCS will also dump debug state to the file every 10 seconds For periodic printing of debug state, I made it happen every 1 minute. This is because every 10 seconds usually is very spammy.	2021-11-28 20:42:37 -08:00
Amog Kamsetty	ac843a957c	[Release] Use large instance type for long running `impala` test (#20691 ) * add * update	2021-11-26 11:42:41 -08:00
Yi Cheng	b6b4d4cf57	[test] Update base image for nightly testing (#20680 ) ## Why are these changes needed? `base_image: "anyscale/ray-ml:pinned-nightly-py37"` doesn't exist anymore which fails a lot of nightly tests, change to `base_image: "anyscale/ray-ml:nightly-py37-gpu"` ## Related issue number ## Checks	2021-11-23 11:06:44 -08:00
Alex Wu	a811b2b6d7	[hotfix] Fix stress_test_many_tasks cluster environment (#20519 ) This should fix the long running release tests that are failing to build their app configs. It seems like pip install ray[all] now downgrades the ray version. It's unclear why, but most likely, a dependency has pinned the ray version now. This PR explicitely install the version of Ray that we want after the pip install ray[all] to fix the problem.	2021-11-18 11:51:46 -08:00
Amog Kamsetty	3f1092fb3d	[Release] Revert impala app config (#20397 )	2021-11-18 11:24:22 -08:00
Simon Mo	ca90c63483	[Serve] Add serve failure test to CI (#20392 )	2021-11-16 08:12:08 -08:00
gjoliver	7fe42341ed	[release] Switch many_ppo test to use the canonical rllib app cfg as well. (#20310 )	2021-11-12 20:51:28 -08:00
Edward Oakes	7c9881b73d	[serve] Fix serve_failure test (#20268 )	2021-11-11 19:19:34 -08:00
Amog Kamsetty	18dcf1ac25	[Release] Use nightly Docker images (#20001 ) * use nightly * switch ml cpu to ray cpu * fix * add pytest * add more pytest * add constraint * add tensorflow * fix merge conflict * add tblib * fix * add back uninstall	2021-11-10 18:00:16 -08:00
xwjiang2010	2fbbecf1e4	[release] Define worker node type even if no worker node is needed. (#20223 )	2021-11-10 11:19:09 -08:00
xwjiang2010	99826d2ca6	[Release] Increase node memory by 2X in many_ppo test. (#19591 )	2021-11-08 08:10:09 +09:00
gjoliver	1341bb59bf	[RLlib; Release testing] long_running_tests should use RLlib's app_config. (#20095 )	2021-11-05 15:18:56 +01:00
Yi Cheng	04f60c998e	[nightly] Fix pytest missing in nightly test (#20076 ) ## Why are these changes needed? In the nightly test we see ``` Command returned non-success status: 1; Command logs:Traceback (most recent call last): File "dask_on_ray/large_scale_test.py", line 17, in from ray._private.test_utils import monitor_memory_usage File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/test_utils.py", line 18, in import pytest ModuleNotFoundError: No module named 'pytest' ``` This PR fixes this error. ## Related issue number	2021-11-04 13:38:05 -07:00
Avnish Narayan	026bf01071	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-03 16:24:00 +01:00
SangBin Cho	bcd27b708f	[Test] Mark many ppo as unstable (#19769 )	2021-10-26 21:27:43 -07:00
Jiao	aaef82920d	[serve] Add periodic timeouts to long poll client to avoid accumulating concurrent tasks in the controller (#19728 )	2021-10-26 09:44:00 -05:00
SangBin Cho	9000f41aa6	[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539 ) * random fixes * Done * done * update the doc * doc lint fix * . * .	2021-10-21 07:37:05 -07:00
Kai Fricke	e07d0953ea	[ci/release] Undo faulty change to many_ppo num_samples (#19388 )	2021-10-14 10:27:31 -07:00
Kai Fricke	9cee83c919	[tune] PBT: Add burn-in period (#19321 )	2021-10-14 16:28:29 +01:00
Jiajun Yao	2b44e9a3e1	Increase disk for long running tests (#19064 )	2021-10-03 22:52:44 -07:00
Jiajun Yao	18bdde1918	Install the test wheel last (#18881 )	2021-09-24 20:56:53 +01:00
Kai Fricke	d52203ee03	[ci/release] Fix long running serve test result fetching (#18880 )	2021-09-24 16:16:01 +01:00
Kai Fricke	15a83d104d	[ci/release] remove legacy release tests (#18592 )	2021-09-15 14:42:58 +01:00
Kai Fricke	7d1e6d3129	[ci/release] Add sanity check for ray wheels hash to release tests (#18489 )	2021-09-10 17:50:31 +01:00
Kai Fricke	21d90a0e9a	Increase disk for serve tests (#17606 )	2021-08-19 17:51:19 +02:00
Clark Zinzow	d958457d07	[Core] Second pass at privatizing APIs. (#17885 ) * gcs_utils * resource_spec * profiling * ray_perf and ray_cluster_perf * test_utils	2021-08-18 20:56:33 -07:00
Kai Fricke	8580e450cb	[release] update/unify base images (#17859 )	2021-08-16 12:44:25 +02:00
architkulkarni	8c1317067d	move variable updates from middle of loop to end (#17591 )	2021-08-05 09:53:01 +01:00
Jiao	f4f702c595	[Release] change default expiration to 2 days in order to prevent custodian kill it early morning (#17215 ) Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-07-20 17:03:14 -07:00
Jiao	6aeda62d40	[Serve] Add serve test config files and wrk dependency (#16631 )	2021-06-28 10:01:55 -07:00
Kai Fricke	9352cb781c	[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests (#16355 )	2021-06-16 21:37:17 +01:00
Kai Fricke	153a8b8fec	[release] convert tune release tests (#15913 )	2021-06-01 11:19:15 -07:00
Kai Fricke	1d52ab819f	[release] release 1.3.0 results and test updates (#15366 ) Convert a number of release tests and add logs for release 1.3.0	2021-05-04 22:10:04 +01:00
Amog Kamsetty	ebc44c3d76	[CI] Upgrade flake8 to 3.9.1 (#15527 ) * formatting * format util * format release * format rllib/agents * format rllib/env * format rllib/execution * format rllib/evaluation * format rllib/examples * format rllib/policy * format rllib utils and tests * format streaming * more formatting * update requirements files * fix rllib type checking * updates * update * fix circular import * Update python/ray/tests/test_runtime_env.py * noqa	2021-05-03 14:23:28 -07:00
Edward Oakes	0f9d1bb223	Serve failure release test fix (#15276 ) This test is currently not tested in CI	2021-04-13 17:49:29 +01:00
Edward Oakes	e4ca337e16	[serve] Change remaining tests to use deployment API (#15167 )	2021-04-08 08:15:38 -05:00
Simon Mo	c963cbc038	Fix Docker Permission for Serve release test again (#13543 )	2021-01-19 12:23:30 -08:00
SangBin Cho	1179db1fc2	Remove an unnecessary file (#13499 )	2021-01-15 18:29:12 -08:00
SangBin Cho	d09df55b14	Update ID specification doc (#13356 )	2021-01-15 15:15:51 -08:00

1 2

61 commits