ray/doc
Kenneth 07372927cc
Enable buffering and spilling to multiple remote storages (#22798)
Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally.

In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes.

Buffer Size | Runtime (s)
-- | --
Default | 3221.865916
256KB | 1758.885839
1MB | 748.226089
10MB | 526.406466
100MB | 494.830513

Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size.

For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes.

Prefixes | Runtime (s)
-- | --
1 | 748.226089
3 | 527.658646
10 | 516.010742


Together these changes enable faster large-scale object spilling.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>
2022-03-11 11:27:02 -05:00
..
azure Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763) 2022-01-20 15:30:56 -08:00
kubernetes [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
source Enable buffering and spilling to multiple remote storages (#22798) 2022-03-11 11:27:02 -05:00
tools Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763) 2022-01-20 15:30:56 -08:00
yarn [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
.gitignore [docs] sphinx gallery removal, migrate to ipynb (#22467) 2022-02-19 01:19:07 -08:00
BUILD [test] add back deleted datasets train test file (#23051) 2022-03-10 21:46:07 -08:00
make.bat Get Sphinx infrastructure in place 2016-07-01 18:21:02 -07:00
Makefile [docs] sphinx gallery removal, migrate to ipynb (#22467) 2022-02-19 01:19:07 -08:00
README.md [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
requirements-doc.txt run code in browser (#22727) 2022-03-02 10:27:00 +01:00
requirements-rtd.txt [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
test_myst_doc.py [docs] sphinx gallery removal, migrate to ipynb (#22467) 2022-02-19 01:19:07 -08:00

Ray Documentation

Repository for documentation of the Ray project, hosted at docs.ray.io.

Installation

To build the documentation, make sure you have ray installed first. For building the documentation locally install the following dependencies:

pip install -r requirements-doc.txt

Building the documentation

To compile the documentation and open it locally, run the following command from this directory.

make html && open _build/html/index.html

Building just one sub-project

Often your changes in documentation just concern one sub-project, such as Tune or Train. To build just this one sub-project, and ignore the rest (leading to build warnings due to broken references etc.), run the following command:

DOC_LIB=<project> sphinx-build -b html -d _build/doctrees  source _build/html

where <project> is the name of the sub-project and can be any of the docs projects in the source/ directory either called tune, rllib, train, cluster, serve, raysgd, data or the ones starting with ray-, e.g. ray-observability.

Announcements and includes

To add new announcements and other messaging to the top or bottom of a documentation page, check the _includes folder first to see if the message you want is already there (like "get help" or "we're hiring" etc.) If not, add the template you want and include it accordingly, i.e. with

.. include:: /_includes/<my-announcement>

This ensures consistent messaging across documentation pages.

To check if there are broken links, run the following (we are currently not running this in the CI since there are false positives).

make linkcheck

Running doctests

To run tests for examples shipping with docstrings in Python files, run the following command:

make doctest

Adding examples as MyST Markdown Notebooks

You can now add executable notebooks to this project, which will get built into the documentation. An example can be found here. By default, building the docs with make html will not run those notebooks. If you set the RUN_NOTEBOOKS environment variable to "cache", each notebook cell will be run when you build the documentation, and outputs will be cached into _build/.jupyter_cache.

RUN_NOTEBOOKS="cache" make html

To force re-running the notebooks, use RUN_NOTEBOOKS="force".

Using caching, this means the first time you build the documentation, it might take a while to run the notebooks. After that, notebook execution is only triggered when you change the notebook source file.

The benefits of working with notebooks for examples are that you don't separate the code from the documentation, but can still easily smoke-test the code.