hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Kai Fricke	1ed8bd0345	[release/xgboost/lightgbm] Fix app config dependency install overwriting ray (#25307 ) This line: ``` pip3 install -U --force-reinstall xgboost xgboost_ray lightgbm_ray petastorm ``` also re-installs the dependencies of these packages, and the `--force-reinstall` means we overwrite existing ones. This leads us to re-install the latest ray release, overwriting the wheels to be tested: ``` [INFO] 5/31/2022, 12:12:16 AM: Successfully installed ... ray-1.12.1 ... [INFO] 5/31/2022, 12:12:17 AM: * Executed RUN pip3 install -U --force-reinstall xgboost xgboost_ray petastorm (ff6ae9f9) ``` Instead, we should use `--no-deps` to avoid re-installing dependencies. Also, the wheels sanity check is moved to after installing additional packages in order to catch these errors earlier.	2022-05-31 13:46:17 +02:00
Sven Mika	09886d7ab8	[RLlib] Upgrade gym 0.23 (#24171 )	2022-05-23 08:18:44 +02:00
SangBin Cho	ec653e3196	[Nightly test] Move two line downloads to one line. (#25061 ) It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later	2022-05-22 00:07:03 -07:00
Kai Fricke	6c5229295e	[ci/release] Support running tests with different python versions (#24843 ) OSS release tests currently run with hardcoded Python 3.7 base. In the future we will want to run tests on different python versions. This PR adds support for a new `python` field in the test configuration. The python field will determine both the base image used in the Buildkite runner docker container (for Ray client compatibility) and the base image for the Anyscale cluster environments. Note that in Buildkite, we will still only wait for the python 3.7 base image before kicking off tests. That is acceptable, as we can assume that most wheels finish in a similar time, so even if we wait for the 3.7 image and kick off a 3.8 test, that runner will wait maybe for 5-10 more minutes.	2022-05-17 17:03:12 +01:00
Avnish Narayan	754bcd16f8	[rllib] Pin gym everywhere (#23384 ) This PR Pins gym in the app config.yaml's for rllib and tune so that release tests are no longer broken by the new gym version.	2022-03-22 09:44:22 +00:00
gjoliver	724a140795	[rllib] Make sure json can serialize result dict (#20439 ) We may have fields in the result dict that are or None. Make sure our results are json serializable.	2021-11-17 10:27:00 -08:00
Amog Kamsetty	18dcf1ac25	[Release] Use nightly Docker images (#20001 ) * use nightly * switch ml cpu to ray cpu * fix * add pytest * add more pytest * add constraint * add tensorflow * fix merge conflict * add tblib * fix * add back uninstall	2021-11-10 18:00:16 -08:00
gjoliver	2c1fa459d4	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 ) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-11-03 17:04:27 -07:00
Sven Mika	ba1c489b79	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
gjoliver	2924afa41e	[Release] Create soft links for libcusolver.so.10 as a temporary fix. (#18562 ) Co-authored-by: Jun Gong <jungong@anyscale.com>	2021-09-13 14:37:12 -07:00
Kai Fricke	7d1e6d3129	[ci/release] Add sanity check for ray wheels hash to release tests (#18489 )	2021-09-10 17:50:31 +01:00
Simon Mo	6d24214085	[Release] Make sure to uninstall ray for rllib_tests (#18448 )	2021-09-08 23:29:40 +01:00
Sven Mika	cabaa3b3c6	[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381 )	2021-09-07 11:48:41 +02:00
Sven Mika	5292b70fc6	[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368 )	2021-09-06 17:48:05 +02:00
Sven Mika	59f796edf3	[RLlib] Fix crash when using StochasticSampling exploration (most PG-style algos) w/ tf and numpy > 1.19.5 (#18366 )	2021-09-06 12:14:00 +02:00
Kai Fricke	fb38d06cfb	Move RLLib GPU release test dependencies to ml docker (#18208 )	2021-09-03 09:35:18 +01:00
Sven Mika	a7670d9fab	[RLlib; Testing] Fix smoke-test settings for nightly `learning_tests` and `stress_test`; Add `pybullet_envs` to app-config. (#18274 )	2021-09-01 21:46:06 +02:00
Sven Mika	8acb469b04	[RLlib; Testing] Green all RLlib nightly tests. (#18073 )	2021-08-26 14:09:20 +02:00
Kai Fricke	8580e450cb	[release] update/unify base images (#17859 )	2021-08-16 12:44:25 +02:00
Kai Fricke	3a90804713	[Testing] Add RLlib release tests (#16651 )	2021-08-03 12:34:27 -04:00
Sven Mika	c9d220bcda	[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080 )	2021-06-01 17:39:18 +02:00

21 commits