Commit graph

19 commits

Author SHA1 Message Date
mwtian
0b3fed5ef3
Revert "[Nightly Test] Add a team column to each test config. (#21198)" (#21289)
This reverts commit b5b11b2d06.
2021-12-30 06:44:51 +09:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
gjoliver
d8a61f801f
[RLlib] Create a set of performance benchmark tests to run nightly. (#19945)
* Create a core set of algorithms tests to run nightly.

* Run release tests under tf, tf2, and torch frameworks.

* Fix

* Add eager_tracing option for tf2 framework.

* make sure core tests can run in parallel.

* cql

* Report progress while running nightly/weekly tests.

* Innclude SAC in nightly lineup.

* Revert changes to learning_tests

* rebrand to performance test.

* update build_pipeline.py with new performance_tests name.

* Record stats.

* bug fix, need to populate experiments dict.

* Alphabetize yaml files.

* Allow specifying frameworks. And do not run tf2 by default.

* remove some debugging code.

* fix

* Undo testing changes.

* Do not run CQL regression for now.

* LINT.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-08 18:15:13 +01:00
gjoliver
2c1fa459d4
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807)
* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* bump timeout

* Write a more informational result dict.

* Revert changes to compute config files that are not used.

* add smoke test

* update

* reduce timeout

* Reduce the # of env per worker to 1.

* Small fix for getting trial_states

* Trigger build

* simply result dict

* lint

* more lint

* fix smoke test

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-03 17:04:27 -07:00
Sven Mika
5611150b1a
Increase rllib stress tests timeout for smoke test (#18810) 2021-09-22 14:30:42 +01:00
Sven Mika
ba1c489b79
[RLlib Testing] Lower --smoke-test "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00
gjoliver
df32ed35fd
Extend --smoke-test deadlines for learning and stress regression tests. (#18667) 2021-09-16 09:18:39 +01:00
Kai Fricke
c8188ea70e
[ci/rllib] wait for stress test cluster (#18603) 2021-09-14 19:01:22 +01:00
gjoliver
50cdf551ce
[RLlib] Fix test name typo. (#18423)
Co-authored-by: Jun Gong <jungong@mbpro.local>
2021-09-08 23:30:37 +02:00
Sven Mika
cabaa3b3c6
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381) 2021-09-07 11:48:41 +02:00
Sven Mika
5292b70fc6
[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368) 2021-09-06 17:48:05 +02:00
Sven Mika
a7670d9fab
[RLlib; Testing] Fix smoke-test settings for nightly learning_tests and stress_test; Add pybullet_envs to app-config. (#18274) 2021-09-01 21:46:06 +02:00
Sven Mika
9883505e84
[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). (#18017) 2021-08-24 21:55:27 +02:00
Sven Mika
d47ed62083
[Testing RLlib] Increase timeout for nightly multi-GPU learning tests. (#17987) 2021-08-23 10:37:50 +02:00
Sven Mika
a2d96c513a
[RLlib] Expand machine for nightly multi-gpu learning tests. (#17955) 2021-08-19 22:27:30 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00
Kai Fricke
3a90804713
[Testing] Add RLlib release tests (#16651) 2021-08-03 12:34:27 -04:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests (#16355) 2021-06-16 21:37:17 +01:00
Sven Mika
c9d220bcda
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00