Commit graph

22 commits

Author SHA1 Message Date
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503)
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Stephanie Wang
7c174d0ffe
Make the ref counting test more stressful (#7473) 2020-03-05 20:51:24 -08:00
Simon Mo
29b08ddc09
Improve release process from 0.8.2 (#7303) 2020-02-24 21:18:53 -08:00
Stephanie Wang
2c1f4fd82c
[core] Add long running regression test for distributed ref counting and fix memory leak (#7302)
* Add long running test for serialized IDs and fix mem leak

* comment
2020-02-24 17:58:42 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
Simon Mo
bec92a8946
[Hotfix] Fix flake8 lint failing (#7118) 2020-02-10 19:57:21 -08:00
Simon Mo
f6c09ff614
Add serve stress test (#7076) 2020-02-10 09:37:39 -08:00
Edward Oakes
b750bd7fc9
Use 2xlarge instances in long running tests (#6802) 2020-01-15 19:47:59 -06:00
Sven
60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Edward Oakes
032e8553c7
use numpy in long-running tests (#6448) 2019-12-11 17:53:30 -08:00
Philipp Moritz
a454c815f1
Fix long running stress tests (#6374) 2019-12-05 18:29:41 -08:00
Eric Liang
53641f1f74
Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Eric Liang
a101812b9f
Replace --redis-address with --address in test, docs, tune, rllib (#5602)
* wip

* add tests and tune

* add ci

* test fix

* lint

* fix tests

* wip

* sugar dep
2019-09-01 16:53:02 -07:00
Philipp Moritz
ccee77aafd fix node_failures.py (#5167) 2019-07-11 11:40:13 -07:00
Hersh Godse
89722ff003 [tune] Directional metrics for components (#4120) (#4915) 2019-06-02 22:13:40 -07:00
bjg2
77005d1814 [rllib] Make batch timeout for remote workers tunable (#4435) 2019-03-29 13:19:42 -07:00
William Ma
11580fb7dc Changes where actor resources are assigned (#4323) 2019-03-24 15:49:36 -07:00
William Ma
f423909aec Temporary fix for many_actor_task.py (#4315) 2019-03-09 00:07:45 -08:00
Robert Nishihara
fd2d8c2c06 Remove Jenkins backend tests and add new long running stress test. (#4288) 2019-03-08 15:29:39 -08:00
Robert Nishihara
f151aa8723 Update long running stress tests and add actor death test. (#4275) 2019-03-06 14:26:45 -08:00
Eric Liang
6e3384a719
[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Robert Nishihara
75504b9586 Add script for running infinitely long stress tests. (#4163)
Running `./ci/long_running_tests/start_workloads.sh` will start several workloads running (each in their own EC2 instance).
- The workloads run forever.
- The workloads all simulate multiple nodes but use a single machine.
- You can get the tail of each workload by running `./ci/long_running_tests/check_workloads.sh`.
- You have to manually shut down the instances.

As discussed with @ericl @richardliaw, the idea here is to optimize for the debuggability of the tests. If one of them fails, you can ssh to the relevant instance and see all of the logs.
2019-02-27 14:33:06 -08:00