![]() We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues: 1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases: a. Heterogeneous-shaped (i.e. ragged) tensors b. Struct arrays 2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails. Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column. 3. The tensor extension cast fallback warning spams the logs. This PR: 1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3). 2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1). 3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc. |
||
---|---|---|
.. | ||
azure | ||
kubernetes | ||
source | ||
tools | ||
yarn | ||
.gitignore | ||
BUILD | ||
make.bat | ||
Makefile | ||
README.md | ||
requirements-doc.txt | ||
requirements-rtd.txt | ||
test_myst_doc.py |
Ray Documentation
Repository for documentation of the Ray project, hosted at docs.ray.io.
Installation
To build the documentation, make sure you have ray
installed first.
For building the documentation locally install the following dependencies:
pip install -r requirements-doc.txt
Building the documentation
To compile the documentation and open it locally, run the following command from this directory.
make develop && open _build/html/index.html
NOTE: The above command is for development. To reproduce build failures from the CI, you should use
make html
which is the same asmake develop
but treats warnings as errors.
Building just one sub-project
Often your changes in documentation just concern one sub-project, such as Tune or Train. To build just this one sub-project, and ignore the rest (leading to build warnings due to broken references etc.), run the following command:
DOC_LIB=<project> sphinx-build -b html -d _build/doctrees source _build/html
where <project>
is the name of the sub-project and can be any of the docs projects in the source/
directory either called tune
, rllib
, train
, cluster
, serve
, data
or the ones starting
with ray-
, e.g. ray-observability
.
Announcements and includes
To add new announcements and other messaging to the top or bottom of a documentation page,
check the _includes
folder first to see if the message you want is already there (like "get help"
or "we're hiring" etc.)
If not, add the template you want and include it accordingly, i.e. with
.. include:: /_includes/<my-announcement>
This ensures consistent messaging across documentation pages.
Checking for broken links
To check if there are broken links, run the following (we are currently not running this in the CI since there are false positives).
make linkcheck
Running doctests
To run tests for examples shipping with docstrings in Python files, run the following command:
make doctest
Adding examples as MyST Markdown Notebooks
You can now add executable notebooks to this project,
which will get built into the documentation.
An example can be found here.
By default, building the docs with make develop
will not run those notebooks.
If you set the RUN_NOTEBOOKS
environment variable to "cache"
, each notebook cell will be run when you build the
documentation, and outputs will be cached into _build/.jupyter_cache
.
RUN_NOTEBOOKS="cache" make develop
To force re-running the notebooks, use RUN_NOTEBOOKS="force"
.
Using caching, this means the first time you build the documentation, it might take a while to run the notebooks. After that, notebook execution is only triggered when you change the notebook source file.
The benefits of working with notebooks for examples are that you don't separate the code from the documentation, but can still easily smoke-test the code.
Adding Markdown docs from external (ecosystem) repositories
In order to avoid a situation where duplicate documentation files live in both the docs folder in this repository and in external repositories of ecosystem libraries (eg. xgboost-ray), you can specify Markdown files that will be downloaded from other GitHub repositories during the build process.
In order to do that, simply edit the EXTERNAL_MARKDOWN_FILES
list in source/custom_directives.py
using the format in the comment. Before build process, the specified files will be downloaded, preprocessed
and saved to given paths. The build process will then proceed as normal.
While both GitHub Markdown and MyST are supersets of Common Markdown, there are differences in syntax.
Furthermore, some contents such as Sphinx headers are not desirable to be displayed on GitHub.
In order to deal with this, simple preprocessing is performed to allow for differences
in rendering on GitHub and in docs. You can use two commands ($UNCOMMENT
and $REMOVE
/$END_REMOVE
)
in the Markdown file, specified in the following way:
$UNCOMMENT
GitHub:
<!--$UNCOMMENTthis will be uncommented--> More text
In docs, this will become:
this will be uncommented More text
$REMOVE
/$END_REMOVE
GitHub:
<!--$REMOVE-->This will be removed<!--$END_REMOVE--> More text
In docs, this will become:
More text
Please note that the parsing is extremely simple (regex replace) and will not support nesting.
Testing changes locally
If you want to run the preprocessing locally on a specific file (to eg. see how it will render after docs have been built), run source/preprocess_github_markdown.py PATH_TO_MARKDOWN_FILE PATH_TO_PREPROCESSED_MARKDOWN_FILE
. Make sure to also edit EXTERNAL_MARKDOWN_FILES
in source/custom_directives.py
so that your file does not get overwriten by one downloaded form GitHub.