ray/release/rllib_tests/app_config.yaml

base_image: "anyscale/ray-ml:pinned-nightly-py37-gpu"
env_vars: {}
debian_packages:
  - unzip
  - zip

python:
  # These dependencies should be handled by requirements_rllib.txt and
  # requirements_ml_docker.txt
  pip_packages:
    - torch==1.9.0 # TODO(amogkam): Remove after nightly images are available.
  conda_packages: []

post_build_cmds:
  # Create a couple of soft links so tf 2.4.3 works with cuda 11.2.
  # TODO(jungong): remove these once product ray-ml docker gets upgraded to use tf 2.5.0.
  - sudo ln -s /usr/local/cuda /usr/local/nvidia
  - sudo ln -s /usr/local/cuda/lib64/libcusolver.so.11 /usr/local/cuda/lib64/libcusolver.so.10
  - pip install tensorflow==2.5.0
  # END: TODO

  - pip uninstall -y ray || true
  - pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }}
  - {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }}
  # Clone the rl-experiments repo for offline-RL files.
  - git clone https://github.com/ray-project/rl-experiments.git
  - cp rl-experiments/halfcheetah-sac/2021-09-06/halfcheetah_expert_sac.zip ~/.
[release] update/unify base images (#17859) 2021-08-16 11:44:25 +01:00			`base_image: "anyscale/ray-ml:pinned-nightly-py37-gpu"`
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00			`env_vars: {}`
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381) 2021-09-07 11:48:41 +02:00			`debian_packages:`
			`- unzip`
			`- zip`
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00
			`python:`
[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368) 2021-09-06 17:48:05 +02:00			`# These dependencies should be handled by requirements_rllib.txt and`
			`# requirements_ml_docker.txt`
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> 2021-11-03 17:04:27 -07:00			`pip_packages:`
			`- torch==1.9.0 # TODO(amogkam): Remove after nightly images are available.`
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00			`conda_packages: []`

			`post_build_cmds:`
[Release] Create soft links for libcusolver.so.10 as a temporary fix. (#18562) Co-authored-by: Jun Gong <jungong@anyscale.com> 2021-09-13 14:37:12 -07:00			`# Create a couple of soft links so tf 2.4.3 works with cuda 11.2.`
[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00			`# TODO(jungong): remove these once product ray-ml docker gets upgraded to use tf 2.5.0.`
[Release] Create soft links for libcusolver.so.10 as a temporary fix. (#18562) Co-authored-by: Jun Gong <jungong@anyscale.com> 2021-09-13 14:37:12 -07:00			`- sudo ln -s /usr/local/cuda /usr/local/nvidia`
			`- sudo ln -s /usr/local/cuda/lib64/libcusolver.so.11 /usr/local/cuda/lib64/libcusolver.so.10`
[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00			`- pip install tensorflow==2.5.0`
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> 2021-11-03 17:04:27 -07:00			`# END: TODO`
[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00
[Release] Make sure to uninstall ray for rllib_tests (#18448) 2021-09-08 15:29:40 -07:00			`- pip uninstall -y ray \|\| true`
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00			`- pip3 install -U {{ env["RAY_WHEELS"] \| default("ray") }}`
[ci/release] Add sanity check for ray wheels hash to release tests (#18489) 2021-09-10 17:50:31 +01:00			`- {{ env["RAY_WHEELS_SANITY_CHECK"] \| default("echo No Ray wheels sanity check") }}`
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381) 2021-09-07 11:48:41 +02:00			`# Clone the rl-experiments repo for offline-RL files.`
			`- git clone https://github.com/ray-project/rl-experiments.git`
			`- cp rl-experiments/halfcheetah-sac/2021-09-06/halfcheetah_expert_sac.zip ~/.`