ray/ci/long_running_tests
Eric Liang 53641f1f74
Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
..
.rayproject Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
workloads Move more unit tests to bazel (#6250) 2019-11-24 11:43:34 -08:00
.gitignore Switch cluster longevity tests to DLAMI, fix ray up verbosity (#5084) 2019-07-02 00:19:05 -07:00
README.rst Convert long running stress tests to projects (#5641) 2019-09-26 11:25:09 -07:00

Long Running Tests
==================

This directory contains the long-running workloads which are intended to run
forever until they fail. To set up the project you need to run

.. code-block:: bash

    pip install any
    any project create


Running the Workloads
---------------------

You can start all the workloads with:

.. code-block:: bash

    any session start -y run --workload="*" --wheel=https://s3-us-west-2.amazonaws.com/ray-wheels/releases/0.7.5/6da7eff4b20340f92d3fe1160df35caa68922a97/ray-0.7.5-cp36-cp36m-manylinux1_x86_64.whl

This will start one EC2 instance per workload and will start the workloads
running (one per instance). You can start a specific workload by specifying
its name as an argument ``--workload=`` instead of ``"*"``. A list of available options
is available via `any session start run --help`.


Check Workload Statuses
-----------------------

To check up on the workloads, run either
``any session --name="*" execute check-load``, which
will print the load on each machine, or
``any session --name="*" execute show-output``, which
will print the tail of the output for each workload.

To debug workloads that have failed, you may find it useful to ssh to the
relevant machine, attach to the tmux session (usually ``tmux a -t 0``), inspect
the logs under ``/tmp/ray/session*/logs/``, and also inspect
``/tmp/ray/session*/debug_state.txt``.

Shut Down the Workloads
-----------------------

The instances running the workloads can all be killed by running
``any session stop --name "*"``.

Adding a Workload
-----------------

To create a new workload, simply add a new Python file under ``workloads/`` and
add the workload in the run command in `.rayproject/project.yaml`.