[docs] Switch docs to use rllib train instead of train.py

2025-03-05 18:11:42 -05:00 · 2018-12-04 17:36:06 -08:00 · 2018-12-04 17:36:06 -08:00 · 93a9d32288
commit 93a9d32288
parent 9d0bd50e78
6 changed files with 23 additions and 22 deletions
--- a/README.rst
+++ b/README.rst
@ -41,12 +41,12 @@ Example Use

 Ray comes with libraries that accelerate deep learning and reinforcement learning development:

- `Ray Tune`_: Hyperparameter Optimization Framework
- `Ray RLlib`_: Scalable Reinforcement Learning
+- `Tune`_: Hyperparameter Optimization Framework
+- `RLlib`_: Scalable Reinforcement Learning
 - `Distributed Training <http://ray.readthedocs.io/en/latest/distributed_sgd.html>`__

-.. _`Ray Tune`: http://ray.readthedocs.io/en/latest/tune.html
-.. _`Ray RLlib`: http://ray.readthedocs.io/en/latest/rllib.html
+.. _`Tune`: http://ray.readthedocs.io/en/latest/tune.html
+.. _`RLlib`: http://ray.readthedocs.io/en/latest/rllib.html

 Installation
 ------------
--- a/doc/source/example-a3c.rst
+++ b/doc/source/example-a3c.rst
@ -29,7 +29,7 @@ You can run the code with

 .. code-block:: bash

-  python/ray/rllib/train.py --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}'
+  rllib train --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}'

 Reinforcement Learning
 ----------------------
--- a/doc/source/example-evolution-strategies.rst
+++ b/doc/source/example-evolution-strategies.rst
@ -18,13 +18,13 @@ on the ``Humanoid-v1`` gym environment.

 .. code-block:: bash

-  python/ray/rllib/train.py --env=Humanoid-v1 --run=ES
+  rllib train --env=Humanoid-v1 --run=ES

 To train a policy on a cluster (e.g., using 900 workers), run the following.

 .. code-block:: bash

-  python ray/python/ray/rllib/train.py \
+  rllib train \
      --env=Humanoid-v1 \
      --run=ES \
      --redis-address=<redis-address> \
--- a/doc/source/example-policy-gradient.rst
+++ b/doc/source/example-policy-gradient.rst
@ -21,7 +21,7 @@ Then you can run the example as follows.

 .. code-block:: bash

-  python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO
+  rllib train --env=Pong-ram-v4 --run=PPO

 This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
 try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
--- a/doc/source/rllib-training.rst
+++ b/doc/source/rllib-training.rst
@ -10,11 +10,11 @@ be trained, checkpointed, or an action computed.

 .. image:: rllib-api.svg

-You can train a simple DQN agent with the following command
+You can train a simple DQN agent with the following command:

 .. code-block:: bash

-    python ray/python/ray/rllib/train.py --run DQN --env CartPole-v0
+    rllib train --run DQN --env CartPole-v0

 By default, the results will be logged to a subdirectory of ``~/ray_results``.
 This subdirectory will contain a file ``params.json`` which contains the
@ -26,10 +26,12 @@ training process with TensorBoard by running

     tensorboard --logdir=~/ray_results

-The ``train.py`` script has a number of options you can show by running
+The ``rllib train`` command (same as the ``train.py`` script in the repo) has a number of options you can show by running:

 .. code-block:: bash

+    rllib train --help
+    -or-
    python ray/python/ray/rllib/train.py --help

 The most important options are for choosing the environment
@ -42,16 +44,16 @@ Evaluating Trained Agents

 In order to save checkpoints from which to evaluate agents,
 set ``--checkpoint-freq`` (number of training iterations between checkpoints)
-when running ``train.py``.
+when running ``rllib train``.


 An example of evaluating a previously trained DQN agent is as follows:

 .. code-block:: bash

-    python ray/python/ray/rllib/rollout.py \
-          ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \
-          --run DQN --env CartPole-v0 --steps 10000
+    rllib rollout \
+        ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \
+        --run DQN --env CartPole-v0 --steps 10000

 The ``rollout.py`` helper script reconstructs a DQN agent from the checkpoint
 located at ``~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1``
@ -70,8 +72,7 @@ In an example below, we train A2C by specifying 8 workers through the config fla

 .. code-block:: bash

-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
-        --run=A2C --config '{"num_workers": 8}'
+    rllib train --env=PongDeterministic-v4 --run=A2C --config '{"num_workers": 8}'

 Specifying Resources
 ~~~~~~~~~~~~~~~~~~~~
@ -98,11 +99,11 @@ Some good hyperparameters and settings are available in
 (some of them are tuned to run on GPUs). If you find better settings or tune
 an algorithm on a different domain, consider submitting a Pull Request!

-You can run these with the ``train.py`` script as follows:
+You can run these with the ``rllib train`` command as follows:

 .. code-block:: bash

-    python ray/python/ray/rllib/train.py -f /path/to/tuned/example.yaml
+    rllib train -f /path/to/tuned/example.yaml

 Python API
 ----------
@ -356,7 +357,7 @@ The ``"monitor": true`` config can be used to save Gym episode videos to the res

 .. code-block:: bash

-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
+    rllib train --env=PongDeterministic-v4 \
        --run=A2C --config '{"num_workers": 2, "monitor": true}'

    # videos will be saved in the ~/ray_results/<experiment> dir, for example
@ -372,7 +373,7 @@ You can control the agent log level via the ``"log_level"`` flag. Valid values a

 .. code-block:: bash

-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
+    rllib train --env=PongDeterministic-v4 \
        --run=A2C --config '{"num_workers": 2, "log_level": "DEBUG"}'

 Stack Traces
--- a/python/ray/rllib/scripts.py
+++ b/python/ray/rllib/scripts.py
@ -14,7 +14,7 @@ Example usage for training:
    rllib train --run DQN --env CartPole-v0

 Example usage for rollout:
-    rllib rollout /tmp/ray/checkpoint_dir/checkpoint-0 --run DQN
+    rllib rollout /trial_dir/checkpoint_1/checkpoint-1 --run DQN
 """