From 93a9d32288a0087227d6563b7a9b6dc1e0343a0a Mon Sep 17 00:00:00 2001
From: Eric Liang <ekhliang@gmail.com>
Date: Tue, 4 Dec 2018 17:36:06 -0800
Subject: [PATCH] [docs] Switch docs to use rllib train instead of train.py

---
 README.rst                                  |  8 +++---
 doc/source/example-a3c.rst                  |  2 +-
 doc/source/example-evolution-strategies.rst |  4 +--
 doc/source/example-policy-gradient.rst      |  2 +-
 doc/source/rllib-training.rst               | 27 +++++++++++----------
 python/ray/rllib/scripts.py                 |  2 +-
 6 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/README.rst b/README.rst
index 7e50123d9..5fd892f95 100644
--- a/README.rst
+++ b/README.rst
@@ -41,12 +41,12 @@ Example Use
 
 Ray comes with libraries that accelerate deep learning and reinforcement learning development:
 
-- `Ray Tune`_: Hyperparameter Optimization Framework
-- `Ray RLlib`_: Scalable Reinforcement Learning
+- `Tune`_: Hyperparameter Optimization Framework
+- `RLlib`_: Scalable Reinforcement Learning
 - `Distributed Training <http://ray.readthedocs.io/en/latest/distributed_sgd.html>`__
 
-.. _`Ray Tune`: http://ray.readthedocs.io/en/latest/tune.html
-.. _`Ray RLlib`: http://ray.readthedocs.io/en/latest/rllib.html
+.. _`Tune`: http://ray.readthedocs.io/en/latest/tune.html
+.. _`RLlib`: http://ray.readthedocs.io/en/latest/rllib.html
 
 Installation
 ------------
diff --git a/doc/source/example-a3c.rst b/doc/source/example-a3c.rst
index 23a6a3e15..47378fce9 100644
--- a/doc/source/example-a3c.rst
+++ b/doc/source/example-a3c.rst
@@ -29,7 +29,7 @@ You can run the code with
 
 .. code-block:: bash
 
-  python/ray/rllib/train.py --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}'
+  rllib train --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}'
 
 Reinforcement Learning
 ----------------------
diff --git a/doc/source/example-evolution-strategies.rst b/doc/source/example-evolution-strategies.rst
index 8f613b08d..d048d261f 100644
--- a/doc/source/example-evolution-strategies.rst
+++ b/doc/source/example-evolution-strategies.rst
@@ -18,13 +18,13 @@ on the ``Humanoid-v1`` gym environment.
 
 .. code-block:: bash
 
-  python/ray/rllib/train.py --env=Humanoid-v1 --run=ES
+  rllib train --env=Humanoid-v1 --run=ES
 
 To train a policy on a cluster (e.g., using 900 workers), run the following.
 
 .. code-block:: bash
 
-  python ray/python/ray/rllib/train.py \
+  rllib train \
       --env=Humanoid-v1 \
       --run=ES \
       --redis-address=<redis-address> \
diff --git a/doc/source/example-policy-gradient.rst b/doc/source/example-policy-gradient.rst
index 3fccb992a..9b5857504 100644
--- a/doc/source/example-policy-gradient.rst
+++ b/doc/source/example-policy-gradient.rst
@@ -21,7 +21,7 @@ Then you can run the example as follows.
 
 .. code-block:: bash
 
-  python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO
+  rllib train --env=Pong-ram-v4 --run=PPO
 
 This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
 try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
diff --git a/doc/source/rllib-training.rst b/doc/source/rllib-training.rst
index e647b0a27..4b6630090 100644
--- a/doc/source/rllib-training.rst
+++ b/doc/source/rllib-training.rst
@@ -10,11 +10,11 @@ be trained, checkpointed, or an action computed.
 
 .. image:: rllib-api.svg
 
-You can train a simple DQN agent with the following command
+You can train a simple DQN agent with the following command:
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/train.py --run DQN --env CartPole-v0
+    rllib train --run DQN --env CartPole-v0
 
 By default, the results will be logged to a subdirectory of ``~/ray_results``.
 This subdirectory will contain a file ``params.json`` which contains the
@@ -26,10 +26,12 @@ training process with TensorBoard by running
 
      tensorboard --logdir=~/ray_results
 
-The ``train.py`` script has a number of options you can show by running
+The ``rllib train`` command (same as the ``train.py`` script in the repo) has a number of options you can show by running:
 
 .. code-block:: bash
 
+    rllib train --help
+    -or-
     python ray/python/ray/rllib/train.py --help
 
 The most important options are for choosing the environment
@@ -42,16 +44,16 @@ Evaluating Trained Agents
 
 In order to save checkpoints from which to evaluate agents,
 set ``--checkpoint-freq`` (number of training iterations between checkpoints)
-when running ``train.py``.
+when running ``rllib train``.
 
 
 An example of evaluating a previously trained DQN agent is as follows:
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/rollout.py \
-          ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \
-          --run DQN --env CartPole-v0 --steps 10000
+    rllib rollout \
+        ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \
+        --run DQN --env CartPole-v0 --steps 10000
 
 The ``rollout.py`` helper script reconstructs a DQN agent from the checkpoint
 located at ``~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1``
@@ -70,8 +72,7 @@ In an example below, we train A2C by specifying 8 workers through the config fla
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
-        --run=A2C --config '{"num_workers": 8}'
+    rllib train --env=PongDeterministic-v4 --run=A2C --config '{"num_workers": 8}'
 
 Specifying Resources
 ~~~~~~~~~~~~~~~~~~~~
@@ -98,11 +99,11 @@ Some good hyperparameters and settings are available in
 (some of them are tuned to run on GPUs). If you find better settings or tune
 an algorithm on a different domain, consider submitting a Pull Request!
 
-You can run these with the ``train.py`` script as follows:
+You can run these with the ``rllib train`` command as follows:
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/train.py -f /path/to/tuned/example.yaml
+    rllib train -f /path/to/tuned/example.yaml
 
 Python API
 ----------
@@ -356,7 +357,7 @@ The ``"monitor": true`` config can be used to save Gym episode videos to the res
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
+    rllib train --env=PongDeterministic-v4 \
         --run=A2C --config '{"num_workers": 2, "monitor": true}'
 
     # videos will be saved in the ~/ray_results/<experiment> dir, for example
@@ -372,7 +373,7 @@ You can control the agent log level via the ``"log_level"`` flag. Valid values a
 
 .. code-block:: bash
 
-    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
+    rllib train --env=PongDeterministic-v4 \
         --run=A2C --config '{"num_workers": 2, "log_level": "DEBUG"}'
 
 Stack Traces
diff --git a/python/ray/rllib/scripts.py b/python/ray/rllib/scripts.py
index cc48b83cf..88d5d5629 100644
--- a/python/ray/rllib/scripts.py
+++ b/python/ray/rllib/scripts.py
@@ -14,7 +14,7 @@ Example usage for training:
     rllib train --run DQN --env CartPole-v0
 
 Example usage for rollout:
-    rllib rollout /tmp/ray/checkpoint_dir/checkpoint-0 --run DQN
+    rllib rollout /trial_dir/checkpoint_1/checkpoint-1 --run DQN
 """