hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

History

Clark Zinzow df124d0ad5 [AIR - Datasets] Hide tensor extension from UDFs. (#27019 ) We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues: 1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases: a. Heterogeneous-shaped (i.e. ragged) tensors b. Struct arrays 2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails. Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column. 3. The tensor extension cast fallback warning spams the logs. This PR: 1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3). 2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1). 3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc.		2022-07-28 10:37:45 -07:00
..
azure	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 )	2022-01-20 15:30:56 -08:00
kubernetes	exclude doc_code from import sorting (#25772 )	2022-06-15 11:34:45 -07:00
source	[AIR - Datasets] Hide tensor extension from UDFs. (#27019 )	2022-07-28 10:37:45 -07:00
tools	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 )	2022-01-20 15:30:56 -08:00
yarn	exclude doc_code from import sorting (#25772 )	2022-06-15 11:34:45 -07:00
.gitignore	[docs] [serve] Extended Gradio notebook example for Ray Serve deployments (#23494 )	2022-04-27 10:03:28 -07:00
BUILD	[core/docs]Add a new section under Ray Core called Ray Gotchas (#26624 )	2022-07-16 16:53:01 -07:00
make.bat	Get Sphinx infrastructure in place	2016-07-01 18:21:02 -07:00
Makefile	[Docs] Automatically render latest `ray_lightning` docs (#23729 )	2022-04-08 16:57:23 -07:00
README.md	[Docs] Fix documentation building instructions (#25942 )	2022-06-20 18:04:25 -07:00
requirements-doc.txt	[Docs] Update Train user guide to use the new APIs (#26091 )	2022-07-11 15:10:10 -07:00
requirements-rtd.txt	[docs] new structure (#21776 )	2022-01-21 15:42:05 -08:00
test_myst_doc.py	[AIR/CI] Fix Hugging Face notebook example (#26475 )	2022-07-13 09:16:42 -07:00

README.md

Ray Documentation

Repository for documentation of the Ray project, hosted at docs.ray.io.

Installation

To build the documentation, make sure you have ray installed first. For building the documentation locally install the following dependencies:

pip install -r requirements-doc.txt

Building the documentation

To compile the documentation and open it locally, run the following command from this directory.

make develop && open _build/html/index.html

NOTE: The above command is for development. To reproduce build failures from the CI, you should use make html which is the same as make develop but treats warnings as errors.

Building just one sub-project

Often your changes in documentation just concern one sub-project, such as Tune or Train. To build just this one sub-project, and ignore the rest (leading to build warnings due to broken references etc.), run the following command:

DOC_LIB=<project> sphinx-build -b html -d _build/doctrees  source _build/html

where <project> is the name of the sub-project and can be any of the docs projects in the source/ directory either called tune, rllib, train, cluster, serve, data or the ones starting with ray-, e.g. ray-observability.

Announcements and includes

To add new announcements and other messaging to the top or bottom of a documentation page, check the _includes folder first to see if the message you want is already there (like "get help" or "we're hiring" etc.) If not, add the template you want and include it accordingly, i.e. with

.. include:: /_includes/<my-announcement>

This ensures consistent messaging across documentation pages.

Checking for broken links

To check if there are broken links, run the following (we are currently not running this in the CI since there are false positives).

make linkcheck

Running doctests

To run tests for examples shipping with docstrings in Python files, run the following command:

make doctest

Adding examples as MyST Markdown Notebooks

You can now add executable notebooks to this project, which will get built into the documentation. An example can be found here. By default, building the docs with make develop will not run those notebooks. If you set the RUN_NOTEBOOKS environment variable to "cache", each notebook cell will be run when you build the documentation, and outputs will be cached into _build/.jupyter_cache.

RUN_NOTEBOOKS="cache" make develop

To force re-running the notebooks, use RUN_NOTEBOOKS="force".

Using caching, this means the first time you build the documentation, it might take a while to run the notebooks. After that, notebook execution is only triggered when you change the notebook source file.

The benefits of working with notebooks for examples are that you don't separate the code from the documentation, but can still easily smoke-test the code.

Adding Markdown docs from external (ecosystem) repositories

In order to avoid a situation where duplicate documentation files live in both the docs folder in this repository and in external repositories of ecosystem libraries (eg. xgboost-ray), you can specify Markdown files that will be downloaded from other GitHub repositories during the build process.

In order to do that, simply edit the EXTERNAL_MARKDOWN_FILES list in source/custom_directives.py using the format in the comment. Before build process, the specified files will be downloaded, preprocessed and saved to given paths. The build process will then proceed as normal.

While both GitHub Markdown and MyST are supersets of Common Markdown, there are differences in syntax. Furthermore, some contents such as Sphinx headers are not desirable to be displayed on GitHub. In order to deal with this, simple preprocessing is performed to allow for differences in rendering on GitHub and in docs. You can use two commands ($UNCOMMENT and $REMOVE/$END_REMOVE) in the Markdown file, specified in the following way:

`$UNCOMMENT`

GitHub:

<!--$UNCOMMENTthis will be uncommented--> More text

In docs, this will become:

this will be uncommented More text

`$REMOVE`/`$END_REMOVE`