ray/release/rllib_tests/app_config.yaml
gjoliver 2c1fa459d4
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807)
* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* bump timeout

* Write a more informational result dict.

* Revert changes to compute config files that are not used.

* add smoke test

* update

* reduce timeout

* Reduce the # of env per worker to 1.

* Small fix for getting trial_states

* Trigger build

* simply result dict

* lint

* more lint

* fix smoke test

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-03 17:04:27 -07:00

27 lines
1.1 KiB
YAML
Executable file

base_image: "anyscale/ray-ml:pinned-nightly-py37-gpu"
env_vars: {}
debian_packages:
- unzip
- zip
python:
# These dependencies should be handled by requirements_rllib.txt and
# requirements_ml_docker.txt
pip_packages:
- torch==1.9.0 # TODO(amogkam): Remove after nightly images are available.
conda_packages: []
post_build_cmds:
# Create a couple of soft links so tf 2.4.3 works with cuda 11.2.
# TODO(jungong): remove these once product ray-ml docker gets upgraded to use tf 2.5.0.
- sudo ln -s /usr/local/cuda /usr/local/nvidia
- sudo ln -s /usr/local/cuda/lib64/libcusolver.so.11 /usr/local/cuda/lib64/libcusolver.so.10
- pip install tensorflow==2.5.0
# END: TODO
- pip uninstall -y ray || true
- pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }}
- {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }}
# Clone the rl-experiments repo for offline-RL files.
- git clone https://github.com/ray-project/rl-experiments.git
- cp rl-experiments/halfcheetah-sac/2021-09-06/halfcheetah_expert_sac.zip ~/.