ray/release/rllib_tests at 4d583da7d5765356d1fb413612cec16ffceb1b79 - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

gjoliver 2c1fa459d4 [RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 ) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>		2021-11-03 17:04:27 -07:00
..
connect_tests	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
learning_tests	Reduce success criteria for a few learning tests. (#19484 )	2021-10-18 15:44:38 -07:00
multi_gpu_learning_tests	[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368 )	2021-09-06 17:48:05 +02:00
multi_gpu_with_attention_learning_tests	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
multi_gpu_with_lstm_learning_tests	[RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550 )	2021-09-14 19:58:10 +02:00
stress_tests	[RLlib; testing] Fix bug in stress tests not handling >1 trials per experiment (due to grid-search in IMPALA stress tests). (#18705 )	2021-09-20 15:31:57 +02:00
unit_gpu_tests	[RLlib] Add multi-GPU learning tests to nightly. (#17778 )	2021-08-18 17:21:01 +02:00
1gpu_4cpus.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
2gpus_32cpus.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
4gpus_64cpus.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
4gpus_544_cpus.yaml	[Tune release test] Set 500G disk space for rllib_tests. (#19730 )	2021-10-26 10:12:03 -07:00
8gpus_64cpus.yaml	[Tune release test] Set 500G disk space for rllib_tests. (#19730 )	2021-10-26 10:12:03 -07:00
8gpus_96cpus.yaml	[Tune release test] Set 500G disk space for rllib_tests. (#19730 )	2021-10-26 10:12:03 -07:00
app_config.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
auto_scale.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
connect_driver_requirements.txt	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
README.rst	[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080 )	2021-06-01 17:39:18 +02:00
rllib_tests.yaml	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 )	2021-11-03 17:04:27 -07:00
wait_cluster.py	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00

README.rst

RLlib Tests
===========

This directory contains various RLlib release tests.

You should run these tests with the `releaser <https://github.com/ray-project/releaser>`_ tool.

Overview
--------
Currently, there are 3 RLlib tests:

1. ``learning_tests`` - Tests, whether major algos (tf+torch) can learn in Atari or PyBullet envs in ~30-60min.
1. ``stress_tests`` - Runs 4 IMPALA Atari jobs, each one using 1GPU and 128CPUs (needs autoscaling to succeed).
1. ``unit_gpu_tests`` - Tests, whether all of RLlib's example scripts can be run on a GPU.

Generally the releaser tool will run all tests in parallel.

Acceptance criteria
-------------------
These tests are considered passing when they throw no error at the end of the output log.