ray/release/data_processing_tests
2021-04-14 18:27:20 -07:00
..
workloads [Release Test] Modify parameter to reduce stress (#15048) 2021-04-14 18:27:20 -07:00
cluster.yaml [Object Spilling] 100GB shuffle release test (#13729) 2021-01-29 12:38:06 -08:00
dask_on_ray.yaml ray[cluster] -> ray[default] (#15251) 2021-04-14 09:37:04 -07:00
multi_node.yaml [Test] Large scale dask on ray test (#14340) 2021-03-23 11:00:35 -07:00
README.rst [Test] Large scale dask on ray test (#14340) 2021-03-23 11:00:35 -07:00

Running script
--------------
There are 2 workloads. Each workerload requires a different cluster.yaml.

Make sure to copy & paste both drivers.

Run `unset RAY_ADDRESS; python workloads/streaming_shuffle.py`. Use `cluster.yaml` for this release test.
Run `unset RAY_ADDRESS; python workloads/dask_on_ray_large_scale_test.py`. Use `dask_on_ray.yaml` for this release test.

Note that when you run `dask_on_ray.yaml`, you need to follow the below procedures.

```
ray up dask_on_ray.yaml -y # Start the ray cluster.
# Wait until the cluster nodes are up. Use `watch ray status` and wait until all worker nodes are up.
ray down dask_on_ray.yaml -y # After the cluster is up, you should call ray down.
ray up dask_on_ray.yaml -y
```

This process is required because ulimit is not permitted for images that we are using. Ulimit is necessary for large cluster testing like this.
Check out https://discuss.ray.io/t/setting-ulimits-on-ec2-instances/590/2 for more details

Success Criteria
----------------

For `streaming_shuffle.py`, make sure to include the output string to the release logs.

For `dask_on_ray_large_scale_test.py`, make sure the test runs for at least for an hour. This test should succeed, otherwise, it is a release blocker.
Check out https://github.com/ray-project/ray/pull/14340#discussion_r599271079 to learn the success condition of this test.