ray/doc/source/ray-air/benchmarks.rst

Benchmarks
==========

Below we document key performance benchmarks for common AIR tasks and workflows.

Bulk Ingest
-----------

This task uses the DummyTrainer module to ingest 200GiB of synthetic data.

We test out the performance across different cluster sizes.

- `Bulk Ingest Script`_
- `Bulk Ingest Cluster Configuration`_

For this benchmark, we configured the nodes to have reasonable disk size and throughput to account for object spilling.

.. code-block:: yaml

    aws:
        BlockDeviceMappings:
            - DeviceName: /dev/sda1
              Ebs:
                Iops: 5000
                Throughput: 1000
                VolumeSize: 1000
                VolumeType: gp3

.. list-table::

    * - **Cluster Setup**
      - **# workers**
      - **Time taken**
      - **Throughput**
      - **Data Spilled**
      - **Command**
    * - 1 m5.4xlarge
      - 1 actor
      - 390 s
      - 0.51 GB/s
      - 205 GiB
      - `python data_benchmark.py --dataset-size-gib=200 --num-workers=1 --placement-strategy=SPREAD`
    * - 5 m5.4xlarge
      - 5 actors
      - 70 s
      - 2.85 GiB/s
      - 206 GiB
      - `python data_benchmark.py --dataset-size-gib=200 --num-workers=5 --placement-strategy=SPREAD`
    * - 20 m5.4xlarge nodes
      - 20 actors
      - 3.8 s
      - 52.6 GiB/s
      - 0 GB
      - `python data_benchmark.py --dataset-size-gib=200 --num-workers=20 --placement-strategy=SPREAD`


XGBoost Batch Prediction
------------------------

This task uses the BatchPredictor module to process different amounts of data
using an XGBoost model.

We test out the performance across different cluster sizes and data sizes.

- `XGBoost Prediction Script`_
- `XGBoost Cluster Configuration`_

.. TODO: Add script for generating data and running the benchmark.

.. list-table::

    * - **Cluster Setup**
      - **# workers**
      - **Data Size**
      - **# of rows**
      - **Time taken**
      - **Throughput**
      - **Command**
    * - 1 m5.4xlarge
      - 1 actor
      - 10 GB
      - 26M rows
      - 275 s
      - 94.5k rows/sec
      - `python xgboost_benchmark.py --size 10GB`
    * - 10 m5.4xlarge nodes
      - 10 actors (12 CPUs each)
      - 100 GB
      - 260M rows
      - 331 s
      - 786k rows/sec
      - `python xgboost_benchmark.py --size 100GB`


XGBoost training
----------------

This task uses the XGBoostTrainer module to train on different sizes of data
with different amounts of parallelism.

XGBoost parameters were kept as defaults for xgboost==1.6.1 this task.


- `XGBoost Training Script`_
- `XGBoost Cluster Configuration`_

.. list-table::

    * - **Cluster Setup**
      - **# workers**
      - **Data Size**
      - **# of rows**
      - **Time taken**
      - **Command**
    * - 1 m5.4xlarge
      - 1 actor
      - 10 GB
      - 26M rows
      - 692 s
      - `python xgboost_benchmark.py --size 10GB`
    * - 10 m5.4xlarge nodes
      - 10 actors (12 CPUs each)
      - 100 GB
      - 260M rows
      - 693 s
      - `python xgboost_benchmark.py --size 100GB`


.. _`Bulk Ingest Script`: https://github.com/ray-project/ray/blob/a30bdf9ef34a45f973b589993f7707a763df6ebf/release/air_tests/air_benchmarks/workloads/data_benchmark.py#L25-L40
.. _`Bulk Ingest Cluster Configuration`: https://github.com/ray-project/ray/blob/a30bdf9ef34a45f973b589993f7707a763df6ebf/release/air_tests/air_benchmarks/data_20_nodes.yaml#L6-L15
.. _`XGBoost Training Script`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py#L40-L58
.. _`XGBoost Prediction Script`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py#L63-L71
.. _`XGBoost Cluster Configuration`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/xgboost_compute_tpl.yaml#L6-L24
[docs] Move AIR benchmarks to top level (#26632) 2022-07-16 15:34:31 -07:00			`Benchmarks`
			`==========`
[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00
			`Below we document key performance benchmarks for common AIR tasks and workflows.`

[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00			`Bulk Ingest`
			`-----------`

			`This task uses the DummyTrainer module to ingest 200GiB of synthetic data.`

			`We test out the performance across different cluster sizes.`

			- `Bulk Ingest Script`_
			- `Bulk Ingest Cluster Configuration`_

			`For this benchmark, we configured the nodes to have reasonable disk size and throughput to account for object spilling.`

			`.. code-block:: yaml`

			`aws:`
			`BlockDeviceMappings:`
			`- DeviceName: /dev/sda1`
			`Ebs:`
			`Iops: 5000`
			`Throughput: 1000`
			`VolumeSize: 1000`
			`VolumeType: gp3`

			`.. list-table::`

			`* - Cluster Setup`
			`- # workers`
			`- Time taken`
			`- Throughput`
			`- Data Spilled`
			`- Command`
			`* - 1 m5.4xlarge`
			`- 1 actor`
			`- 390 s`
			`- 0.51 GB/s`
			`- 205 GiB`
			- `python data_benchmark.py --dataset-size-gib=200 --num-workers=1 --placement-strategy=SPREAD`
			`* - 5 m5.4xlarge`
			`- 5 actors`
			`- 70 s`
			`- 2.85 GiB/s`
			`- 206 GiB`
			- `python data_benchmark.py --dataset-size-gib=200 --num-workers=5 --placement-strategy=SPREAD`
			`* - 20 m5.4xlarge nodes`
			`- 20 actors`
			`- 3.8 s`
			`- 52.6 GiB/s`
			`- 0 GB`
			- `python data_benchmark.py --dataset-size-gib=200 --num-workers=20 --placement-strategy=SPREAD`


[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00			`XGBoost Batch Prediction`
			`------------------------`

			`This task uses the BatchPredictor module to process different amounts of data`
			`using an XGBoost model.`

			`We test out the performance across different cluster sizes and data sizes.`

			- `XGBoost Prediction Script`_
[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00			- `XGBoost Cluster Configuration`_
[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00
			`.. TODO: Add script for generating data and running the benchmark.`

			`.. list-table::`

			`* - Cluster Setup`
			`- # workers`
			`- Data Size`
			`- # of rows`
			`- Time taken`
			`- Throughput`
			`- Command`
			`* - 1 m5.4xlarge`
			`- 1 actor`
			`- 10 GB`
			`- 26M rows`
			`- 275 s`
			`- 94.5k rows/sec`
			- `python xgboost_benchmark.py --size 10GB`
			`* - 10 m5.4xlarge nodes`
			`- 10 actors (12 CPUs each)`
			`- 100 GB`
			`- 260M rows`
			`- 331 s`
			`- 786k rows/sec`
			- `python xgboost_benchmark.py --size 100GB`


			`XGBoost training`
			`----------------`

			`This task uses the XGBoostTrainer module to train on different sizes of data`
			`with different amounts of parallelism.`

			`XGBoost parameters were kept as defaults for xgboost==1.6.1 this task.`


			- `XGBoost Training Script`_
[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00			- `XGBoost Cluster Configuration`_
[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00
			`.. list-table::`

			`* - Cluster Setup`
			`- # workers`
			`- Data Size`
			`- # of rows`
			`- Time taken`
			`- Command`
			`* - 1 m5.4xlarge`
			`- 1 actor`
			`- 10 GB`
			`- 26M rows`
			`- 692 s`
			- `python xgboost_benchmark.py --size 10GB`
			`* - 10 m5.4xlarge nodes`
			`- 10 actors (12 CPUs each)`
			`- 100 GB`
			`- 260M rows`
			`- 693 s`
			- `python xgboost_benchmark.py --size 100GB`


[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00			.. _`Bulk Ingest Script`: https://github.com/ray-project/ray/blob/a30bdf9ef34a45f973b589993f7707a763df6ebf/release/air_tests/air_benchmarks/workloads/data_benchmark.py#L25-L40
			.. _`Bulk Ingest Cluster Configuration`: https://github.com/ray-project/ray/blob/a30bdf9ef34a45f973b589993f7707a763df6ebf/release/air_tests/air_benchmarks/data_20_nodes.yaml#L6-L15
[air] Add initial benchmark section (#26608) 2022-07-15 15:33:48 -07:00			.. _`XGBoost Training Script`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py#L40-L58
			.. _`XGBoost Prediction Script`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py#L63-L71
[air] add bulk ingest benchmarks (#26618) 2022-07-15 22:01:23 -07:00			.. _`XGBoost Cluster Configuration`: https://github.com/ray-project/ray/blob/a241e6a0f5a630d6ed5b84cce30c51963834d15b/release/air_tests/air_benchmarks/xgboost_compute_tpl.yaml#L6-L24