ray/ci/long_running_tests at edb063c3c86a95f4b3781213cc7fabd0eeeb24d8 - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

William Ma f423909aec Temporary fix for many_actor_task.py (#4315 )		2019-03-09 00:07:45 -08:00
..
workloads	Temporary fix for many_actor_task.py (#4315 )	2019-03-09 00:07:45 -08:00
check_workloads.sh	[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215 )	2019-03-04 14:05:42 -08:00
config.yaml	update version from 0.7.0.dev0 to 0.7.0.dev1 (#4282 )	2019-03-06 14:43:09 -08:00
README.rst	[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215 )	2019-03-04 14:05:42 -08:00
shut_down_workloads.sh	[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215 )	2019-03-04 14:05:42 -08:00
start_workloads.sh	Update long running stress tests and add actor death test. (#4275 )	2019-03-06 14:26:45 -08:00

README.rst

Long Running Tests
==================

This directory contains scripts for starting long-running workloads which are
intended to run forever until they fail.

Running the Workloads
---------------------

To run the workloads, run ``./start_workloads.sh``. This will start one EC2
instance per  workload and will start the workloads running (one per instance).
Running the ``./start_workloads.sh`` script again will clean up any state from
the previous runs and will start the workloads again.

Check Workload Statuses
-----------------------

To check up on the workloads, run either ``./check_workloads.sh --load``, which
will print the load on each machine, or ``./check_workloads.sh --logs``, which
will print the tail of the output for each workload.

To debug workloads that have failed, you may find it useful to ssh to the
relevant machine, attach to the tmux session (usually ``tmux a -t 0``), inspect
the logs under ``/tmp/ray/session*/logs/``, and also inspect
``/tmp/ray/session*/debug_state.txt``.

Shut Down the Workloads
-----------------------

The instances running the workloads can all be killed by running
``./shut_down_workloads.sh``.

Adding a Workload
-----------------

To create a new workload, simply add a new Python file under ``workloads/``.