[docs] Add support for external markdown (#23505)

This PR fixes the issue of diverging documentation between Ray Docs and ecosystem library readmes which live in separate repos (eg. xgboost_ray). This is achieved by adding an extra step before the docs build process starts that downloads the readmes of specified ecosystem libraries from their GitHub repositories. The files are then preprocessed by a very simple parser to allow for differences between GitHub and Docs markdowns. In summary, this makes the markdown files in ecosystem library repositories single sources of truth and removes the need to manually keep the doc pages up to date, all the while allowing for differences between what's rendered on GitHub and in the Docs. See ray-project/xgboost_ray#204 & https://ray--23505.org.readthedocs.build/en/23505/ray-more-libs/xgboost-ray.html for an example. Needs ray-project/xgboost_ray#204 and ray-project/lightgbm_ray#30 to be merged first.
2025-03-05 18:11:42 -05:00 · 2022-03-31 08:38:14 -07:00 · 2022-03-31 08:38:14 -07:00 · 756d08cd31
commit 756d08cd31
parent 853f6d6de3
7 changed files with 157 additions and 1196 deletions
--- a/doc/README.md
+++ b/doc/README.md
@ -81,3 +81,53 @@ Using caching, this means the first time you build the documentation, it might t
 After that, notebook execution is only triggered when you change the notebook source file.

 The benefits of working with notebooks for examples are that you don't separate the code from the documentation, but can still easily smoke-test the code.
+
+## Adding Markdown docs from external (ecosystem) repositories
+
+In order to avoid a situation where duplicate documentation files live in both the docs folder
+in this repository and in external repositories of ecosystem libraries (eg. xgboost-ray), you can
+specify Markdown files that will be downloaded from other GitHub repositories during the build process.
+
+In order to do that, simply edit the `EXTERNAL_MARKDOWN_FILES` list in `source/custom_directives.py`
+using the format in the comment. Before build process, the specified files will be downloaded, preprocessed
+and saved to given paths. The build process will then proceed as normal.
+
+While both GitHub Markdown and MyST are supersets of Common Markdown, there are differences in syntax.
+Furthermore, some contents such as Sphinx headers are not desirable to be displayed on GitHub.
+In order to deal with this, simple preprocessing is performed to allow for differences
+in rendering on GitHub and in docs. You can use two commands (`$UNCOMMENT` and `$REMOVE`/`$END_REMOVE`)
+in the Markdown file, specified in the following way:
+
+### `$UNCOMMENT`
+
+GitHub:
+
+```html
+<!--$UNCOMMENTthis will be uncommented--> More text
+```
+
+In docs, this will become:
+
+```html
+this will be uncommented More text
+```
+
+### `$REMOVE`/`$END_REMOVE`
+
+GitHub:
+
+```html
+<!--$REMOVE-->This will be removed<!--$END_REMOVE--> More text
+```
+
+In docs, this will become:
+
+```html
+More text
+```
+
+Please note that the parsing is extremely simple (regex replace) and will not support nesting.
+
+## Testing changes locally
+
+If you want to run the preprocessing locally on a specific file (to eg. see how it will render after docs have been built), run `source/preprocess_github_markdown.py PATH_TO_MARKDOWN_FILE`. Make sure to also edit `EXTERNAL_MARKDOWN_FILES` in `source/custom_directives.py` so that your file does not get overwriten by one downloaded form GitHub.
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@ -9,6 +9,9 @@ from datetime import datetime
 # Mocking modules allows Sphinx to work without installing Ray.
 mock_modules()

+# Download docs from ecosystem library repos
+download_and_preprocess_ecosystem_docs()
+
 assert (
    "ray" not in sys.modules
 ), "If ray is already imported, we will not render documentation correctly!"
--- a/doc/source/custom_directives.py
+++ b/doc/source/custom_directives.py
@ -1,6 +1,7 @@
 import urllib
 import mock
 import sys
+from preprocess_github_markdown import preprocess_github_markdown_file

 # Note: the scipy import has to stay here, it's used implicitly down the line
 import scipy.stats  # noqa: F401
@ -10,6 +11,7 @@ __all__ = [
    "fix_xgb_lgbm_docs",
    "mock_modules",
    "update_context",
+    "download_and_preprocess_ecosystem_docs",
 ]

 try:
@ -174,3 +176,51 @@ def mock_modules():

    for mod_name in CHILD_MOCK_MODULES:
        sys.modules[mod_name] = ChildClassMock()
+
+
+# Add doc files from external repositories to be downloaded during build here
+# (repo, ref, path to get, path to save on disk)
+EXTERNAL_MARKDOWN_FILES = [
+    ("ray-project/xgboost_ray", "master", "README.md", "ray-more-libs/xgboost-ray.md"),
+    (
+        "ray-project/lightgbm_ray",
+        "master",
+        "README.md",
+        "ray-more-libs/lightgbm-ray.md",
+    ),
+]
+
+
+def download_and_preprocess_ecosystem_docs():
+    """
+    This function downloads markdown readme files for various
+    ecosystem libraries, saves them in specified locations and preprocesses
+    them before sphinx build starts.
+
+    If you have ecosystem libraries that live in a separate repo from Ray,
+    adding them here will allow for their docs to be present in Ray docs
+    without the need for duplicate files. For more details, see ``doc/README.md``.
+    """
+
+    import urllib.request
+    import requests
+
+    def get_latest_release_tag(repo: str) -> str:
+        """repo is just the repo name, eg. ray-project/ray"""
+        response = requests.get(f"https://api.github.com/repos/{repo}/releases/latest")
+        return response.json()["tag_name"]
+
+    def get_file_from_github(
+        repo: str, ref: str, path_to_get: str, path_to_save_on_disk: str
+    ) -> None:
+        """If ``ref == "latest"``, use latest release"""
+        if ref == "latest":
+            ref = get_latest_release_tag(repo)
+        urllib.request.urlretrieve(
+            f"https://raw.githubusercontent.com/{repo}/{ref}/{path_to_get}",
+            path_to_save_on_disk,
+        )
+
+    for x in EXTERNAL_MARKDOWN_FILES:
+        get_file_from_github(*x)
+        preprocess_github_markdown_file(x[-1])
--- a/doc/source/preprocess_github_markdown.py
+++ b/doc/source/preprocess_github_markdown.py
@ -0,0 +1,40 @@
+import re
+import argparse
+import pathlib
+
+
+def preprocess_github_markdown_file(path: str):
+    """
+    Preprocesses GitHub Markdown files by:
+        - Uncommenting all ``<!-- -->`` comments in which opening tag is immediately
+          succeded by ``$UNCOMMENT``(eg. ``<!--$UNCOMMENTthis will be uncommented-->``)
+        - Removing text between ``<!--$REMOVE-->`` and ``<!--$END_REMOVE-->``
+
+    This is to enable translation between GitHub Markdown and MyST Markdown used
+    in docs. For more details, see ``doc/README.md``.
+    """
+    with open(path, "r") as f:
+        text = f.read()
+    # $UNCOMMENT
+    text = re.sub(r"<!--\s*\$UNCOMMENT(.*?)(-->)", r"\1", text, flags=re.DOTALL)
+    # $REMOVE
+    text = re.sub(
+        r"(<!--\s*\$REMOVE\s*-->)(.*?)(<!--\s*\$END_REMOVE\s*-->)",
+        r"",
+        text,
+        flags=re.DOTALL,
+    )
+    with open(path, "w") as f:
+        f.write(text)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Preprocess github markdown file to Ray Docs MyST markdown"
+    )
+    parser.add_argument(
+        "path", type=pathlib.Path, help="Path to github markdown file to preprocess"
+    )
+    args, _ = parser.parse_known_args()
+
+    preprocess_github_markdown_file(args.path.expanduser())
--- a/doc/source/ray-more-libs/lightgbm-ray.md
+++ b/doc/source/ray-more-libs/lightgbm-ray.md
@ -0,0 +1,8 @@
+<!--
+DO NOT EDIT THIS FILE
+
+THIS FILE WILL BE AUTOMATICALLY FILLED WITH THE LATEST VERSION OF LIGHTGBM-RAY README
+DURING DOCS BUILD PROCESS.
+
+FOR MORE INFORMATION, SEE doc/source/custom_directives.py::download_and_preprocess_ecosystem_docs
+-->
--- a/doc/source/ray-more-libs/lightgbm-ray.rst
+++ b/doc/source/ray-more-libs/lightgbm-ray.rst
@ -1,569 +0,0 @@
-..
-  This part of the docs is generated from the LightGBM-Ray readme using m2r
-  To update:
-  - run `m2r /path/to/lightgbm_ray/README.md`
-  - copy the contents of README.rst here
-  - Get rid of the badges in the top
-  - Get rid of the references section at the bottom
-  - Be sure not to delete the API reference section in the bottom of this file.
-  - add `.. _lightgbm-ray-tuning:` before the "Hyperparameter Tuning" section
-  - Adjust some link targets (e.g. for "Ray Tune") to anonymous references
-    by adding a second underscore (use `target <link>`__)
-  - Search for `\ **` and delete this from the links (bold links are not supported)
-
-.. _lightgbm-ray:
-
-Distributed LightGBM on Ray
-===========================
-
-LightGBM-Ray is a distributed backend for
-`LightGBM <https://lightgbm.readthedocs.io/>`_\ , built
-on top of
-`distributed computing framework Ray <https://ray.io>`_.
-
-LightGBM-Ray
-
-
-* enables `multi-node <#usage>`_ and `multi-GPU <#multi-gpu-training>`_ training
-* integrates seamlessly with distributed `hyperparameter optimization <#hyperparameter-tuning>`_ library `Ray Tune <http://tune.io>`__
-* comes with `fault tolerance handling <#fault-tolerance>`_ mechanisms, and
-* supports `distributed dataframes and distributed data loading <#distributed-data-loading>`_
-
-All releases are tested on large clusters and workloads.
-
-This package is based on :ref:`XGBoost-Ray <xgboost-ray>`. As of now, XGBoost-Ray is a dependency for LightGBM-Ray.
-
-Installation
------------
-
-You can install the latest LightGBM-Ray release from PIP:
-
-.. code-block:: bash
-
-   pip install lightgbm_ray
-
-If you'd like to install the latest master, use this command instead:
-
-.. code-block:: bash
-
-   pip install git+https://github.com/ray-project/lightgbm_ray.git#lightgbm_ray
-
-Usage
-----
-
-LightGBM-Ray provides a drop-in replacement for LightGBM's ``train``
-function. To pass data, a ``RayDMatrix`` object is required, common
-with XGBoost-Ray. You can also use a scikit-learn
-interface - see next section.
-
-Just as in original ``lgbm.train()`` function, the 
-`training parameters <https://lightgbm.readthedocs.io/en/latest/Parameters.html>`_
-are passed as the ``params`` dictionary.
-
-Ray-specific distributed training parameters are configured with a
-``lightgbm_ray.RayParams`` object. For instance, you can set
-the ``num_actors`` property to specify how many distributed actors
-you would like to use.
-
-Here is a simplified example (which requires ``sklearn``\ ):
-
-**Training:**
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix, RayParams, train
-   from sklearn.datasets import load_breast_cancer
-
-   train_x, train_y = load_breast_cancer(return_X_y=True)
-   train_set = RayDMatrix(train_x, train_y)
-
-   evals_result = {}
-   bst = train(
-       {
-           "objective": "binary",
-           "metric": ["binary_logloss", "binary_error"],
-       },
-       train_set,
-       evals_result=evals_result,
-       valid_sets=[train_set],
-       valid_names=["train"],
-       verbose_eval=False,
-       ray_params=RayParams(num_actors=2, cpus_per_actor=2))
-
-   bst.booster_.save_model("model.lgbm")
-   print("Final training error: {:.4f}".format(
-       evals_result["train"]["binary_error"][-1]))
-
-**Prediction:**
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix, RayParams, predict
-   from sklearn.datasets import load_breast_cancer
-   import lightgbm as lgbm
-
-   data, labels = load_breast_cancer(return_X_y=True)
-
-   dpred = RayDMatrix(data, labels)
-
-   bst = lgbm.Booster(model_file="model.lgbm")
-   pred_ray = predict(bst, dpred, ray_params=RayParams(num_actors=2))
-
-   print(pred_ray)
-
-scikit-learn API
-^^^^^^^^^^^^^^^^
-
-LightGBM-Ray also features a scikit-learn API fully mirroring pure
-LightGBM scikit-learn API, providing a completely drop-in
-replacement. The following estimators are available:
-
-
-* ``RayLGBMClassifier``
-* ``RayLGBMRegressor``
-
-Example usage of ``RayLGBMClassifier``\ :
-
-.. code-block:: python
-
-   from lightgbm_ray import RayLGBMClassifier, RayParams
-   from sklearn.datasets import load_breast_cancer
-   from sklearn.model_selection import train_test_split
-
-   seed = 42
-
-   X, y = load_breast_cancer(return_X_y=True)
-   X_train, X_test, y_train, y_test = train_test_split(
-       X, y, train_size=0.25, random_state=42)
-
-   clf = RayLGBMClassifier(
-       n_jobs=2,  # In LightGBM-Ray, n_jobs sets the number of actors
-       random_state=seed)
-
-   # scikit-learn API will automatically convert the data
-   # to RayDMatrix format as needed.
-   # You can also pass X as a RayDMatrix, in which case
-   # y will be ignored.
-
-   clf.fit(X_train, y_train)
-
-   pred_ray = clf.predict(X_test)
-   print(pred_ray)
-
-   pred_proba_ray = clf.predict_proba(X_test)
-   print(pred_proba_ray)
-
-   # It is also possible to pass a RayParams object
-   # to fit/predict/predict_proba methods - will override
-   # n_jobs set during initialization
-
-   clf.fit(X_train, y_train, ray_params=RayParams(num_actors=2))
-
-   pred_ray = clf.predict(X_test, ray_params=RayParams(num_actors=2))
-   print(pred_ray)
-
-Things to keep in mind:
-
-
-* ``n_jobs`` parameter controls the number of actors spawned.
-  You can pass a ``RayParams`` object to the
-  ``fit``\ /\ ``predict``\ /\ ``predict_proba`` methods as the ``ray_params`` argument
-  for greater control over resource allocation. Doing
-  so will override the value of ``n_jobs`` with the value of
-  ``ray_params.num_actors`` attribute. For more information, refer
-  to the `Resources <#resources>`_ section below.
-* By default ``n_jobs`` is set to ``1``\ , which means the training
-  will **not** be distributed. Make sure to either set ``n_jobs``
-  to a higher value or pass a ``RayParams`` object as outlined above
-  in order to take advantage of LightGBM-Ray's functionality.
-* After calling ``fit``\ , additional evaluation results (e.g. training time,
-  number of rows, callback results) will be available under
-  ``additional_results_`` attribute.
-* ``eval_`` arguments are supported, but early stopping is not.
-* LightGBM-Ray's scikit-learn API is based on LightGBM 3.2.1.
-  While we try to support older LightGBM versions, please note that
-  this library is only fully tested and supported for LightGBM >= 3.2.1.
-
-For more information on the scikit-learn API, refer to the `LightGBM documentation <https://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api>`_.
-
-Data loading
------------
-
-Data is passed to LightGBM-Ray via a ``RayDMatrix`` object.
-
-The ``RayDMatrix`` lazy loads data and stores it sharded in the
-Ray object store. The Ray LightGBM actors then access these
-shards to run their training on.
-
-A ``RayDMatrix`` support various data and file types, like
-Pandas DataFrames, Numpy Arrays, CSV files and Parquet files.
-
-Example loading multiple parquet files:
-
-.. code-block:: python
-
-   import glob
-   from lightgbm_ray import RayDMatrix, RayFileType
-
-   # We can also pass a list of files
-   path = list(sorted(glob.glob("/data/nyc-taxi/*/*/*.parquet")))
-
-   # This argument will be passed to `pd.read_parquet()`
-   columns = [
-       "passenger_count",
-       "trip_distance", "pickup_longitude", "pickup_latitude",
-       "dropoff_longitude", "dropoff_latitude",
-       "fare_amount", "extra", "mta_tax", "tip_amount",
-       "tolls_amount", "total_amount"
-   ]
-
-   dtrain = RayDMatrix(
-       path,
-       label="passenger_count",  # Will select this column as the label
-       columns=columns,
-       # ignore=["total_amount"],  # Optional list of columns to ignore
-       filetype=RayFileType.PARQUET)
-
-.. _lightgbm-ray-tuning:
-
-Hyperparameter Tuning
---------------------
-
-LightGBM-Ray integrates with :ref:`Ray Tune <tune-main>` to provide distributed hyperparameter tuning for your
-distributed LightGBM models. You can run multiple LightGBM-Ray training runs in parallel, each with a different
-hyperparameter configuration, and each training run parallelized by itself. All you have to do is move your training
-code to a function, and pass the function to ``tune.run``. Internally, ``train`` will detect if ``tune`` is being used and will
-automatically report results to tune.
-
-Example using LightGBM-Ray with Ray Tune:
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix, RayParams, train
-   from sklearn.datasets import load_breast_cancer
-
-   num_actors = 2
-   num_cpus_per_actor = 2
-
-   ray_params = RayParams(
-       num_actors=num_actors, cpus_per_actor=num_cpus_per_actor)
-
-   def train_model(config):
-       train_x, train_y = load_breast_cancer(return_X_y=True)
-       train_set = RayDMatrix(train_x, train_y)
-
-       evals_result = {}
-       bst = train(
-           params=config,
-           dtrain=train_set,
-           evals_result=evals_result,
-           valid_sets=[train_set],
-           valid_names=["train"],
-           verbose_eval=False,
-           ray_params=ray_params)
-       bst.booster_.save_model("model.lgbm")
-
-   from ray import tune
-
-   # Specify the hyperparameter search space.
-   config = {
-       "objective": "binary",
-       "metric": ["binary_logloss", "binary_error"],
-       "eta": tune.loguniform(1e-4, 1e-1),
-       "subsample": tune.uniform(0.5, 1.0),
-       "max_depth": tune.randint(1, 9)
-   }
-
-   # Make sure to use the `get_tune_resources` method to set the `resources_per_trial`
-   analysis = tune.run(
-       train_model,
-       config=config,
-       metric="train-binary_error",
-       mode="min",
-       num_samples=4,
-       resources_per_trial=ray_params.get_tune_resources())
-   print("Best hyperparameters", analysis.best_config)
-
-Also see examples/simple_tune.py for another example.
-
-Fault tolerance
---------------
-
-LightGBM-Ray leverages the stateful Ray actor model to
-enable fault tolerant training. Currently, only non-elastic
-training is supported.
-
-Non-elastic training (warm restart)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-When an actor or node dies, LightGBM-Ray will retain the
-state of the remaining actors. In non-elastic training,
-the failed actors will be replaced as soon as resources
-are available again. Only these actors will reload their
-parts of the data. Training will resume once all actors
-are ready for training again.
-
-You can configure this mode in the ``RayParams``\ :
-
-.. code-block:: python
-
-   from lightgbm_ray import RayParams
-
-   ray_params = RayParams(
-       max_actor_restarts=2,    # How often are actors allowed to fail, Default = 0
-   )
-
-Resources
---------
-
-By default, LightGBM-Ray tries to determine the number of CPUs
-available and distributes them evenly across actors.
-
-It is important to note that distributed LightGBM needs at least
-two CPUs per actor to function efficiently (without blocking).
-Therefore, by default, at least two CPUs will be assigned to each actor,
-and an exception will be raised if an actor has less than two CPUs.
-It is possible to override this check by setting the
-``allow_less_than_two_cpus`` argument to ``True``\ , though it is not
-recommended, as it will negatively impact training performance.
-
-In the case of very large clusters or clusters with many different
-machine sizes, it makes sense to limit the number of CPUs per actor
-by setting the ``cpus_per_actor`` argument. Consider always
-setting this explicitly.
-
-The number of LightGBM actors always has to be set manually with
-the ``num_actors`` argument.
-
-Multi GPU training
-^^^^^^^^^^^^^^^^^^
-
-LightGBM-Ray enables multi GPU training. The LightGBM core backend
-will automatically handle communication.
-All you have to do is to start one actor per GPU and set LightGBM's
-``device_type`` to a GPU-compatible option, eg. ``gpu`` (see LightGBM
-documentation for more details.)
-
-For instance, if you have 2 machines with 4 GPUs each, you will want
-to start 8 remote actors, and set ``gpus_per_actor=1``. There is usually
-no benefit in allocating less (e.g. 0.5) or more than one GPU per actor.
-
-You should divide the CPUs evenly across actors per machine, so if your
-machines have 16 CPUs in addition to the 4 GPUs, each actor should have
-4 CPUs to use.
-
-.. code-block:: python
-
-   from lightgbm_ray import RayParams
-
-   ray_params = RayParams(
-       num_actors=8,
-       gpus_per_actor=1,
-       cpus_per_actor=4,   # Divide evenly across actors per machine
-   )
-
-How many remote actors should I use?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-This depends on your workload and your cluster setup.
-Generally there is no inherent benefit of running more than
-one remote actor per node for CPU-only training. This is because
-LightGBM core can already leverage multiple CPUs via threading.
-
-However, there are some cases when you should consider starting
-more than one actor per node:
-
-
-* For `multi GPU training <#multi-gpu-training>`_\ , each GPU should have a separate
-  remote actor. Thus, if your machine has 24 CPUs and 4 GPUs,
-  you will want to start 4 remote actors with 6 CPUs and 1 GPU
-  each
-* In a **heterogeneous cluster**\ , you might want to find the
-  `greatest common divisor <https://en.wikipedia.org/wiki/Greatest_common_divisor>`_
-  for the number of CPUs.
-  E.g. for a cluster with three nodes of 4, 8, and 12 CPUs, respectively,
-  you should set the number of actors to 6 and the CPUs per
-  actor to 4.
-
-Distributed data loading
------------------------
-
-LightGBM-Ray can leverage both centralized and distributed data loading.
-
-In **centralized data loading**\ , the data is partitioned by the head node
-and stored in the object store. Each remote actor then retrieves their
-partitions by querying the Ray object store. Centralized loading is used
-when you pass centralized in-memory dataframes, such as Pandas dataframes
-or Numpy arrays, or when you pass a single source file, such as a single CSV
-or Parquet file.
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix
-
-   # This will use centralized data loading, as only one source file is specified
-   # `label_col` is a column in the CSV, used as the target label
-   ray_params = RayDMatrix("./source_file.csv", label="label_col")
-
-In **distributed data loading**\ , each remote actor loads their data directly from
-the source (e.g. local hard disk, NFS, HDFS, S3),
-without a central bottleneck. The data is still stored in the
-object store, but locally to each actor. This mode is used automatically
-when loading data from multiple CSV or Parquet files. Please note that
-we do not check or enforce partition sizes in this case - it is your job
-to make sure the data is evenly distributed across the source files.
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix
-
-   # This will use distributed data loading, as four source files are specified
-   # Please note that you cannot schedule more than four actors in this case.
-   # `label_col` is a column in the Parquet files, used as the target label
-   ray_params = RayDMatrix([
-       "hdfs:///tmp/part1.parquet",
-       "hdfs:///tmp/part2.parquet",
-       "hdfs:///tmp/part3.parquet",
-       "hdfs:///tmp/part4.parquet",
-   ], label="label_col")
-
-Lastly, LightGBM-Ray supports **distributed dataframe** representations, such
-as :ref:`Ray Datasets <datasets>`,
-`Modin <https://modin.readthedocs.io/en/latest/>`_ and
-`Dask dataframes <https://docs.dask.org/en/latest/dataframe.html>`_
-(used with :ref:`Dask on Ray <dask-on-ray>`).
-Here, LightGBM-Ray will check on which nodes the distributed partitions
-are currently located, and will assign partitions to actors in order to
-minimize cross-node data transfer. Please note that we also assume here
-that partition sizes are uniform.
-
-.. code-block:: python
-
-   from lightgbm_ray import RayDMatrix
-
-   # This will try to allocate the existing Modin partitions
-   # to co-located Ray actors. If this is not possible, data will
-   # be transferred across nodes
-   ray_params = RayDMatrix(existing_modin_df)
-
-
-Data sources
-^^^^^^^^^^^^
-
-The following data sources can be used with a ``RayDMatrix`` object.
-
-.. list-table::
-   :header-rows: 1
-
-   * - Type
-     - Centralized loading
-     - Distributed loading
-   * - Numpy array
-     - Yes
-     - No
-   * - Pandas dataframe
-     - Yes
-     - No
-   * - Single CSV
-     - Yes
-     - No
-   * - Multi CSV
-     - Yes
-     - Yes
-   * - Single Parquet
-     - Yes
-     - No
-   * - Multi Parquet
-     - Yes
-     - Yes
-   * - :ref:`Ray Dataset <datasets>`
-     - Yes
-     - Yes
-   * - `Petastorm <https://github.com/uber/petastorm>`_
-     - Yes
-     - Yes
-   * - `Dask dataframe <https://docs.dask.org/en/latest/dataframe.html>`_
-     - Yes
-     - Yes
-   * - `Modin dataframe <https://modin.readthedocs.io/en/latest/>`_
-     - Yes
-     - Yes
-
-
-Memory usage
------------
-
-Details coming soon.
-
-**Best practices**
-
-In order to reduce peak memory usage, consider the following
-suggestions:
-
-
-* Store data as ``float32`` or less. More precision is often
-  not needed, and keeping data in a smaller format will
-  help reduce peak memory usage for initial data loading.
-* Pass the ``dtype`` when loading data from CSV. Otherwise,
-  floating point values will be loaded as ``np.float64``
-  per default, increasing peak memory usage by 33%.
-
-Placement Strategies
--------------------
-
-LightGBM-Ray leverages Ray's Placement Group API (https://docs.ray.io/en/master/placement-group.html)
-to implement placement strategies for better fault tolerance.
-
-By default, a SPREAD strategy is used for training, which attempts to spread all of the training workers
-across the nodes in a cluster on a best-effort basis. This improves fault tolerance since it minimizes the
-number of worker failures when a node goes down, but comes at a cost of increased inter-node communication
-To disable this strategy, set the ``RXGB_USE_SPREAD_STRATEGY`` environment variable to 0. If disabled, no
-particular placement strategy will be used.
-
-
-When LightGBM-Ray is used with Ray Tune for hyperparameter tuning, a PACK strategy is used. This strategy
-attempts to place all workers for each trial on the same node on a best-effort basis. This means that if a node
-goes down, it will be less likely to impact multiple trials.
-
-When placement strategies are used, LightGBM-Ray will wait for 100 seconds for the required resources
-to become available, and will fail if the required resources cannot be reserved and the cluster cannot autoscale
-to increase the number of resources. You can change the ``RXGB_PLACEMENT_GROUP_TIMEOUT_S`` environment variable to modify
-how long this timeout should be.
-
-More examples
-------------
-
-For complete end to end examples, please have a look at
-the `examples folder <https://github.com/ray-project/lightgbm_ray/tree/master/examples/>`_\ :
-
-* `Simple sklearn breastcancer dataset example <https://github.com/ray-project/lightgbm_ray/blob/main/lightgbm_ray/examples/simple.py>`_ (requires ``sklearn``\ )
-* `HIGGS classification example <https://github.com/ray-project/lightgbm_ray/blob/main/lightgbm_ray/examples/higgs.py>`_
-  (\ `download dataset (2.6 GB) <https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz>`_\ )
-* `HIGGS classification example with Parquet <https://github.com/ray-project/lightgbm_ray/blob/main/lightgbm_ray/examples/higgs_parquet.py>`_ (uses the same dataset)
-* `Test data classification <https://github.com/ray-project/lightgbm_ray/blob/main/lightgbm_ray/examples/train_on_test_data.py>`_ (uses a self-generated dataset)
-
-API reference
-------------
-
-.. autoclass:: lightgbm_ray.RayParams
-    :members:
-
-.. note::
-  The ``xgboost_ray.RayDMatrix`` class is shared with :ref:`XGBoost-Ray <xgboost-ray>`.
-
-.. autoclass:: xgboost_ray.RayDMatrix
-    :members:
-    :noindex:
-
-.. autofunction:: lightgbm_ray.train
-
-.. autofunction:: lightgbm_ray.predict
-
-scikit-learn API
-^^^^^^^^^^^^^^^^
-
-.. autoclass:: lightgbm_ray.RayLGBMClassifier
-    :members:
-
-.. autoclass:: lightgbm_ray.RayLGBMRegressor
-    :members:
--- a/doc/source/ray-more-libs/xgboost-ray.md
+++ b/doc/source/ray-more-libs/xgboost-ray.md
@ -1,629 +1,8 @@
-% This part of the docs is generated from the XGBoost-Ray readme using m2r
-% To update:
-% - run `m2r /path/to/xgboost_ray/README.md`
-% - copy the contents of README.rst here
-% - Get rid of the badges in the top
-% - Get rid of the references section at the bottom
-% - Be sure not to delete the API reference section in the bottom of this file.
-% - add `.. _xgboost-ray-tuning:` before the "Hyperparameter Tuning" section
-% - Adjust some link targets (e.g. for "Ray Tune") to anonymous references
-%   by adding a second underscore (use `target <link>`__)
-% - Search for `\ **` and delete this from the links (bold links are not supported)
+<!--
+DO NOT EDIT THIS FILE

-(xgboost-ray)=
+THIS FILE WILL BE AUTOMATICALLY FILLED WITH THE LATEST VERSION OF XGBOOST-RAY README
+DURING DOCS BUILD PROCESS.

-# Distributed XGBoost on Ray
-
-XGBoost-Ray is a distributed backend for
-[XGBoost](https://xgboost.readthedocs.io/en/latest/), built
-on top of
-[distributed computing framework Ray](https://ray.io).
-
-XGBoost-Ray
-
- enables [multi-node](#usage) and [multi-GPU](#multi-gpu-training) training
- integrates seamlessly with distributed [hyperparameter optimization](#hyperparameter-tuning) library [Ray Tune](http://tune.io)
- comes with advanced [fault tolerance handling](#fault-tolerance) mechanisms, and
- supports [distributed dataframes and distributed data loading](#distributed-data-loading)
-
-All releases are tested on large clusters and workloads.
-
-## Installation
-
-You can install the latest XGBoost-Ray release from PIP:
-
-```bash
-pip install "xgboost_ray"
-```
-
-If you'd like to install the latest master, use this command instead:
-
-```bash
-pip install "git+https://github.com/ray-project/xgboost_ray.git#egg=xgboost_ray"
-```
-
-## Usage
-
-XGBoost-Ray provides a drop-in replacement for XGBoost's `train`
-function. To pass data, instead of using `xgb.DMatrix` you will
-have to use `xgboost_ray.RayDMatrix`. You can also use a scikit-learn
-interface - see next section.
-
-
-Just as in original `xgb.train()` function, the 
-[training parameters](https://xgboost.readthedocs.io/en/stable/parameter.html)
-are passed as the `params` dictionary.
-
-Ray-specific distributed training parameters are configured with a
-`xgboost_ray.RayParams` object. For instance, you can set
-the `num_actors` property to specify how many distributed actors
-you would like to use.
-
-Here is a simplified example (which requires `sklearn`):
-
-**Training:**
-
-```python
-from xgboost_ray import RayDMatrix, RayParams, train
-from sklearn.datasets import load_breast_cancer
-
-train_x, train_y = load_breast_cancer(return_X_y=True)
-train_set = RayDMatrix(train_x, train_y)
-
-evals_result = {}
-bst = train(
-    {
-        "objective": "binary:logistic",
-        "eval_metric": ["logloss", "error"],
-    },
-    train_set,
-    evals_result=evals_result,
-    evals=[(train_set, "train")],
-    verbose_eval=False,
-    ray_params=RayParams(
-        num_actors=2,  # Number of remote actors
-        cpus_per_actor=1))
-
-bst.save_model("model.xgb")
-print("Final training error: {:.4f}".format(
-    evals_result["train"]["error"][-1]))
-```
-
-**Prediction:**
-
-```python
-from xgboost_ray import RayDMatrix, RayParams, predict
-from sklearn.datasets import load_breast_cancer
-import xgboost as xgb
-
-data, labels = load_breast_cancer(return_X_y=True)
-
-dpred = RayDMatrix(data, labels)
-
-bst = xgb.Booster(model_file="model.xgb")
-pred_ray = predict(bst, dpred, ray_params=RayParams(num_actors=2))
-
-print(pred_ray)
-```
-
-### scikit-learn API
-
-XGBoost-Ray also features a scikit-learn API fully mirroring pure
-XGBoost scikit-learn API, providing a completely drop-in
-replacement. The following estimators are available:
-
- `RayXGBClassifier`
- `RayXGRegressor`
- `RayXGBRFClassifier`
- `RayXGBRFRegressor`
- `RayXGBRanker`
-
-Example usage of `RayXGBClassifier`:
-
-```python
-from xgboost_ray import RayXGBClassifier, RayParams
-from sklearn.datasets import load_breast_cancer
-from sklearn.model_selection import train_test_split
-
-seed = 42
-
-X, y = load_breast_cancer(return_X_y=True)
-X_train, X_test, y_train, y_test = train_test_split(
-    X, y, train_size=0.25, random_state=42
-)
-
-clf = RayXGBClassifier(
-    n_jobs=4,  # In XGBoost-Ray, n_jobs sets the number of actors
-    random_state=seed
-)
-
-# scikit-learn API will automatically convert the data
-# to RayDMatrix format as needed.
-# You can also pass X as a RayDMatrix, in which case
-# y will be ignored.
-
-clf.fit(X_train, y_train)
-
-pred_ray = clf.predict(X_test)
-print(pred_ray)
-
-pred_proba_ray = clf.predict_proba(X_test)
-print(pred_proba_ray)
-
-# It is also possible to pass a RayParams object
-# to fit/predict/predict_proba methods - will override
-# n_jobs set during initialization
-
-clf.fit(X_train, y_train, ray_params=RayParams(num_actors=2))
-
-pred_ray = clf.predict(X_test, ray_params=RayParams(num_actors=2))
-print(pred_ray)
-```
-
-Things to keep in mind:
-
- `n_jobs` parameter controls the number of actors spawned.
-  You can pass a `RayParams` object to the
-  `fit`/`predict`/`predict_proba` methods as the `ray_params` argument
-  for greater control over resource allocation. Doing
-  so will override the value of `n_jobs` with the value of
-  `ray_params.num_actors` attribute. For more information, refer
-  to the [Resources](#resources) section below.
- By default `n_jobs` is set to `1`, which means the training
-  will **not** be distributed. Make sure to either set `n_jobs`
-  to a higher value or pass a `RayParams` object as outlined above
-  in order to take advantage of XGBoost-Ray's functionality.
- After calling `fit`, additional evaluation results (e.g. training time,
-  number of rows, callback results) will be available under
-  `additional_results_` attribute.
- XGBoost-Ray's scikit-learn API is based on XGBoost 1.4.
-  While we try to support older XGBoost versions, please note that
-  this library is only fully tested and supported for XGBoost >= 1.4.
-
-For more information on the scikit-learn API, refer to the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn).
-
-## Data loading
-
-Data is passed to XGBoost-Ray via a `RayDMatrix` object.
-
-The `RayDMatrix` lazy loads data and stores it sharded in the
-Ray object store. The Ray XGBoost actors then access these
-shards to run their training on.
-
-A `RayDMatrix` support various data and file types, like
-Pandas DataFrames, Numpy Arrays, CSV files and Parquet files.
-
-Example loading multiple parquet files:
-
-```python
-import glob
-from xgboost_ray import RayDMatrix, RayFileType
-
-# We can also pass a list of files
-path = list(sorted(glob.glob("/data/nyc-taxi/*/*/*.parquet")))
-
-# This argument will be passed to `pd.read_parquet()`
-columns = [
-    "passenger_count",
-    "trip_distance", "pickup_longitude", "pickup_latitude",
-    "dropoff_longitude", "dropoff_latitude",
-    "fare_amount", "extra", "mta_tax", "tip_amount",
-    "tolls_amount", "total_amount"
-]
-
-dtrain = RayDMatrix(
-    path,
-    label="passenger_count",  # Will select this column as the label
-    columns=columns,
-    # ignore=["total_amount"],  # Optional list of columns to ignore
-    filetype=RayFileType.PARQUET)
-```
-
-(xgboost-ray-tuning)=
-
-## Hyperparameter Tuning
-
-XGBoost-Ray integrates with {ref}`Ray Tune <tune-main>` to provide distributed hyperparameter tuning for your
-distributed XGBoost models. You can run multiple XGBoost-Ray training runs in parallel, each with a different
-hyperparameter configuration, and each training run parallelized by itself. All you have to do is move your training
-code to a function, and pass the function to `tune.run`. Internally, `train` will detect if `tune` is being used and will
-automatically report results to tune.
-
-Example using XGBoost-Ray with Ray Tune:
-
-```python
-from xgboost_ray import RayDMatrix, RayParams, train
-from sklearn.datasets import load_breast_cancer
-
-num_actors = 4
-num_cpus_per_actor = 1
-
-ray_params = RayParams(
-    num_actors=num_actors,
-    cpus_per_actor=num_cpus_per_actor)
-
-def train_model(config):
-    train_x, train_y = load_breast_cancer(return_X_y=True)
-    train_set = RayDMatrix(train_x, train_y)
-
-    evals_result = {}
-    bst = train(
-        params=config,
-        dtrain=train_set,
-        evals_result=evals_result,
-        evals=[(train_set, "train")],
-        verbose_eval=False,
-        ray_params=ray_params)
-    bst.save_model("model.xgb")
-
-from ray import tune
-
-# Specify the hyperparameter search space.
-config = {
-    "tree_method": "approx",
-    "objective": "binary:logistic",
-    "eval_metric": ["logloss", "error"],
-    "eta": tune.loguniform(1e-4, 1e-1),
-    "subsample": tune.uniform(0.5, 1.0),
-    "max_depth": tune.randint(1, 9)
-}
-
-# Make sure to use the `get_tune_resources` method to set the `resources_per_trial`
-analysis = tune.run(
-    train_model,
-    config=config,
-    metric="train-error",
-    mode="min",
-    num_samples=4,
-    resources_per_trial=ray_params.get_tune_resources())
-print("Best hyperparameters", analysis.best_config)
-```
-
-Also see examples/simple_tune.py for another example.
-
-## Fault tolerance
-
-XGBoost-Ray leverages the stateful Ray actor model to
-enable fault tolerant training. There are currently
-two modes implemented.
-
-### Non-elastic training (warm restart)
-
-When an actor or node dies, XGBoost-Ray will retain the
-state of the remaining actors. In non-elastic training,
-the failed actors will be replaced as soon as resources
-are available again. Only these actors will reload their
-parts of the data. Training will resume once all actors
-are ready for training again.
-
-You can set this mode in the `RayParams`:
-
-```python
-from xgboost_ray import RayParams
-
-ray_params = RayParams(
-    elastic_training=False,  # Use non-elastic training
-    max_actor_restarts=2,    # How often are actors allowed to fail
-)
-```
-
-### Elastic training
-
-In elastic training, XGBoost-Ray will continue training
-with fewer actors (and on fewer data) when a node or actor
-dies. The missing actors are staged in the background,
-and are reintegrated into training once they are back and
-loaded their data.
-
-This mode will train on fewer data for a period of time,
-which can impact accuracy. In practice, we found these
-effects to be minor, especially for large shuffled datasets.
-The immediate benefit is that training time is reduced
-significantly to almost the same level as if no actors died.
-Thus, especially when data loading takes a large part of
-the total training time, this setting can dramatically speed
-up training times for large distributed jobs.
-
-You can configure this mode in the `RayParams`:
-
-```python
-from xgboost_ray import RayParams
-
-ray_params = RayParams(
-    elastic_training=True,  # Use elastic training
-    max_failed_actors=3,    # Only allow at most 3 actors to die at the same time
-    max_actor_restarts=2,   # How often are actors allowed to fail
-)
-```
-
-## Resources
-
-By default, XGBoost-Ray tries to determine the number of CPUs
-available and distributes them evenly across actors.
-
-In the case of very large clusters or clusters with many different
-machine sizes, it makes sense to limit the number of CPUs per actor
-by setting the `cpus_per_actor` argument. Consider always
-setting this explicitly.
-
-The number of XGBoost actors always has to be set manually with
-the `num_actors` argument.
-
-### Multi GPU training
-
-XGBoost-Ray enables multi GPU training. The XGBoost core backend
-will automatically leverage NCCL2 for cross-device communication.
-All you have to do is to start one actor per GPU and set XGBoost's
-`tree_method` to a GPU-compatible option, eg. `gpu_hist` (see XGBoost
-documentation for more details.) 
-
-For instance, if you have 2 machines with 4 GPUs each, you will want
-to start 8 remote actors, and set `gpus_per_actor=1`. There is usually
-no benefit in allocating less (e.g. 0.5) or more than one GPU per actor.
-
-You should divide the CPUs evenly across actors per machine, so if your
-machines have 16 CPUs in addition to the 4 GPUs, each actor should have
-4 CPUs to use.
-
-```python
-from xgboost_ray import RayParams
-
-ray_params = RayParams(
-    num_actors=8,
-    gpus_per_actor=1,
-    cpus_per_actor=4,   # Divide evenly across actors per machine
-)
-```
-
-### How many remote actors should I use?
-
-This depends on your workload and your cluster setup.
-Generally there is no inherent benefit of running more than
-one remote actor per node for CPU-only training. This is because
-XGBoost core can already leverage multiple CPUs via threading.
-
-However, there are some cases when you should consider starting
-more than one actor per node:
-
- For [multi GPU training](#multi-gpu-training), each GPU should have a separate
-  remote actor. Thus, if your machine has 24 CPUs and 4 GPUs,
-  you will want to start 4 remote actors with 6 CPUs and 1 GPU
-  each
- In a **heterogeneous cluster**, you might want to find the
-  [greatest common divisor](https://en.wikipedia.org/wiki/Greatest_common_divisor)
-  for the number of CPUs.
-  E.g. for a cluster with three nodes of 4, 8, and 12 CPUs, respectively,
-  you should set the number of actors to 6 and the CPUs per
-  actor to 4.
-
-## Distributed data loading
-
-XGBoost-Ray can leverage both centralized and distributed data loading.
-
-In **centralized data loading**, the data is partitioned by the head node
-and stored in the object store. Each remote actor then retrieves their
-partitions by querying the Ray object store. Centralized loading is used
-when you pass centralized in-memory dataframes, such as Pandas dataframes
-or Numpy arrays, or when you pass a single source file, such as a single CSV
-or Parquet file.
-
-```python
-from xgboost_ray import RayDMatrix
-
-# This will use centralized data loading, as only one source file is specified
-# `label_col` is a column in the CSV, used as the target label
-ray_params = RayDMatrix("./source_file.csv", label="label_col")
-```
-
-In **distributed data loading**, each remote actor loads their data directly from
-the source (e.g. local hard disk, NFS, HDFS, S3),
-without a central bottleneck. The data is still stored in the
-object store, but locally to each actor. This mode is used automatically
-when loading data from multiple CSV or Parquet files. Please note that
-we do not check or enforce partition sizes in this case - it is your job
-to make sure the data is evenly distributed across the source files.
-
-```python
-from xgboost_ray import RayDMatrix
-
-# This will use distributed data loading, as four source files are specified
-# Please note that you cannot schedule more than four actors in this case.
-# `label_col` is a column in the Parquet files, used as the target label
-ray_params = RayDMatrix([
-    "hdfs:///tmp/part1.parquet",
-    "hdfs:///tmp/part2.parquet",
-    "hdfs:///tmp/part3.parquet",
-    "hdfs:///tmp/part4.parquet",
-], label="label_col")
-```
-
-Lastly, XGBoost-Ray supports **distributed dataframe** representations, such
-as {ref}`Ray Datasets <datasets>`,
-[Modin](https://modin.readthedocs.io/en/latest/) and
-[Dask dataframes](https://docs.dask.org/en/latest/dataframe.html)
-(used with {ref}`Dask on Ray <dask-on-ray>`).
-Here, XGBoost-Ray will check on which nodes the distributed partitions
-are currently located, and will assign partitions to actors in order to
-minimize cross-node data transfer. Please note that we also assume here
-that partition sizes are uniform.
-
-```python
-from xgboost_ray import RayDMatrix
-
-# This will try to allocate the existing Modin partitions
-# to co-located Ray actors. If this is not possible, data will
-# be transferred across nodes
-ray_params = RayDMatrix(existing_modin_df)
-```
-
-### Data sources
-
-The following data sources can be used with a ``RayDMatrix`` object.
-
-```{eval-rst}
-.. list-table::
-   :header-rows: 1
-
-   * - Type
-     - Centralized loading
-     - Distributed loading
-   * - Numpy array
-     - Yes
-     - No
-   * - Pandas dataframe
-     - Yes
-     - No
-   * - Single CSV
-     - Yes
-     - No
-   * - Multi CSV
-     - Yes
-     - Yes
-   * - Single Parquet
-     - Yes
-     - No
-   * - Multi Parquet
-     - Yes
-     - Yes
-   * - :ref:`Ray Dataset <datasets>`
-     - Yes
-     - Yes
-   * - `Petastorm <https://github.com/uber/petastorm>`_
-     - Yes
-     - Yes
-   * - `Dask dataframe <https://docs.dask.org/en/latest/dataframe.html>`_
-     - Yes
-     - Yes
-   * - `Modin dataframe <https://modin.readthedocs.io/en/latest/>`_
-     - Yes
-     - Yes
-
-```
-
-## Memory usage
-
-XGBoost uses a compute-optimized datastructure, the `DMatrix`,
-to hold training data. When converting a dataset to a `DMatrix`,
-XGBoost creates intermediate copies and ends up
-holding a complete copy of the full data. The data will be converted
-into the local dataformat (on a 64 bit system these are 64 bit floats.)
-Depending on the system and original dataset dtype, this matrix can
-thus occupy more memory than the original dataset.
-
-The **peak memory usage** for CPU-based training is at least
-**3x** the dataset size (assuming dtype `float32` on a 64bit system)
-plus about **400,000 KiB** for other resources,
-like operating system requirements and storing of intermediate
-results.
-
-**Example**
-
- Machine type: AWS m5.xlarge (4 vCPUs, 16 GiB RAM)
- Usable RAM: ~15,350,000 KiB
- Dataset: 1,250,000 rows with 1024 features, dtype float32.
-  Total size: 5,000,000 KiB
- XGBoost DMatrix size: ~10,000,000 KiB
-
-This dataset will fit exactly on this node for training.
-
-Note that the DMatrix size might be lower on a 32 bit system.
-
-**GPUs**
-
-Generally, the same memory requirements exist for GPU-based
-training. Additionally, the GPU must have enough memory
-to hold the dataset.
-
-In the example above, the GPU must have at least
-10,000,000 KiB (about 9.6 GiB) memory. However,
-empirically we found that using a `DeviceQuantileDMatrix`
-seems to show more peak GPU memory usage, possibly
-for intermediate storage when loading data (about 10%).
-
-**Best practices**
-
-In order to reduce peak memory usage, consider the following
-suggestions:
-
- Store data as `float32` or less. More precision is often
-  not needed, and keeping data in a smaller format will
-  help reduce peak memory usage for initial data loading.
- Pass the `dtype` when loading data from CSV. Otherwise,
-  floating point values will be loaded as `np.float64`
-  per default, increasing peak memory usage by 33%.
-
-## Placement Strategies
-
-XGBoost-Ray leverages Ray's Placement Group API (<https://docs.ray.io/en/master/placement-group.html>)
-to implement placement strategies for better fault tolerance.
-
-By default, a SPREAD strategy is used for training, which attempts to spread all of the training workers
-across the nodes in a cluster on a best-effort basis. This improves fault tolerance since it minimizes the
-number of worker failures when a node goes down, but comes at a cost of increased inter-node communication
-To disable this strategy, set the `RXGB_USE_SPREAD_STRATEGY` environment variable to 0. If disabled, no
-particular placement strategy will be used.
-
-Note that this strategy is used only when `elastic_training` is not used. If `elastic_training` is set to `True`,
-no placement strategy is used.
-
-When XGBoost-Ray is used with Ray Tune for hyperparameter tuning, a PACK strategy is used. This strategy
-attempts to place all workers for each trial on the same node on a best-effort basis. This means that if a node
-goes down, it will be less likely to impact multiple trials.
-
-When placement strategies are used, XGBoost-Ray will wait for 100 seconds for the required resources
-to become available, and will fail if the required resources cannot be reserved and the cluster cannot autoscale
-to increase the number of resources. You can change the `RXGB_PLACEMENT_GROUP_TIMEOUT_S` environment variable to modify
-how long this timeout should be.
-
-## More examples
-
-For complete end to end examples, please have a look at
-the [examples folder](https://github.com/ray-project/xgboost_ray/tree/master/examples/):
-
- [Simple sklearn breastcancer dataset example](https://github.com/ray-project/xgboost_ray/blob/master/xgboost_ray/examples/simple.py) (requires `sklearn`)
- [HIGGS classification example](https://github.com/ray-project/xgboost_ray/blob/master/xgboost_ray/examples/higgs.py)
-  ([download dataset (2.6 GB)](https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz))
- [HIGGS classification example with Parquet](https://github.com/ray-project/xgboost_ray/blob/master/xgboost_ray/examples/higgs_parquet.py) (uses the same dataset)
- [Test data classification](https://github.com/ray-project/xgboost_ray/blob/master/xgboost_ray/examples/train_on_test_data.py) (uses a self-generated dataset)
-
-## API reference
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayParams
-    :members:
-```
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayDMatrix
-    :members:
-```
-
-```{eval-rst}
-.. autofunction:: xgboost_ray.train
-```
-
-```{eval-rst}
-.. autofunction:: xgboost_ray.predict
-```
-
-### scikit-learn API
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayXGBClassifier
-    :members:
-```
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayXGBRegressor
-    :members:
-```
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayXGBRFClassifier
-    :members:
-```
-
-```{eval-rst}
-.. autoclass:: xgboost_ray.RayXGBRFRegressor
-    :members:
-```
+FOR MORE INFORMATION, SEE doc/source/custom_directives.py::download_and_preprocess_ecosystem_docs
+-->