From 93a9d32288a0087227d6563b7a9b6dc1e0343a0a Mon Sep 17 00:00:00 2001 From: Eric Liang Date: Tue, 4 Dec 2018 17:36:06 -0800 Subject: [PATCH] [docs] Switch docs to use rllib train instead of train.py --- README.rst | 8 +++--- doc/source/example-a3c.rst | 2 +- doc/source/example-evolution-strategies.rst | 4 +-- doc/source/example-policy-gradient.rst | 2 +- doc/source/rllib-training.rst | 27 +++++++++++---------- python/ray/rllib/scripts.py | 2 +- 6 files changed, 23 insertions(+), 22 deletions(-) diff --git a/README.rst b/README.rst index 7e50123d9..5fd892f95 100644 --- a/README.rst +++ b/README.rst @@ -41,12 +41,12 @@ Example Use Ray comes with libraries that accelerate deep learning and reinforcement learning development: -- `Ray Tune`_: Hyperparameter Optimization Framework -- `Ray RLlib`_: Scalable Reinforcement Learning +- `Tune`_: Hyperparameter Optimization Framework +- `RLlib`_: Scalable Reinforcement Learning - `Distributed Training `__ -.. _`Ray Tune`: http://ray.readthedocs.io/en/latest/tune.html -.. _`Ray RLlib`: http://ray.readthedocs.io/en/latest/rllib.html +.. _`Tune`: http://ray.readthedocs.io/en/latest/tune.html +.. _`RLlib`: http://ray.readthedocs.io/en/latest/rllib.html Installation ------------ diff --git a/doc/source/example-a3c.rst b/doc/source/example-a3c.rst index 23a6a3e15..47378fce9 100644 --- a/doc/source/example-a3c.rst +++ b/doc/source/example-a3c.rst @@ -29,7 +29,7 @@ You can run the code with .. code-block:: bash - python/ray/rllib/train.py --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}' + rllib train --env=Pong-ram-v4 --run=A3C --config='{"num_workers": N}' Reinforcement Learning ---------------------- diff --git a/doc/source/example-evolution-strategies.rst b/doc/source/example-evolution-strategies.rst index 8f613b08d..d048d261f 100644 --- a/doc/source/example-evolution-strategies.rst +++ b/doc/source/example-evolution-strategies.rst @@ -18,13 +18,13 @@ on the ``Humanoid-v1`` gym environment. .. code-block:: bash - python/ray/rllib/train.py --env=Humanoid-v1 --run=ES + rllib train --env=Humanoid-v1 --run=ES To train a policy on a cluster (e.g., using 900 workers), run the following. .. code-block:: bash - python ray/python/ray/rllib/train.py \ + rllib train \ --env=Humanoid-v1 \ --run=ES \ --redis-address= \ diff --git a/doc/source/example-policy-gradient.rst b/doc/source/example-policy-gradient.rst index 3fccb992a..9b5857504 100644 --- a/doc/source/example-policy-gradient.rst +++ b/doc/source/example-policy-gradient.rst @@ -21,7 +21,7 @@ Then you can run the example as follows. .. code-block:: bash - python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO + rllib train --env=Pong-ram-v4 --run=PPO This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment. diff --git a/doc/source/rllib-training.rst b/doc/source/rllib-training.rst index e647b0a27..4b6630090 100644 --- a/doc/source/rllib-training.rst +++ b/doc/source/rllib-training.rst @@ -10,11 +10,11 @@ be trained, checkpointed, or an action computed. .. image:: rllib-api.svg -You can train a simple DQN agent with the following command +You can train a simple DQN agent with the following command: .. code-block:: bash - python ray/python/ray/rllib/train.py --run DQN --env CartPole-v0 + rllib train --run DQN --env CartPole-v0 By default, the results will be logged to a subdirectory of ``~/ray_results``. This subdirectory will contain a file ``params.json`` which contains the @@ -26,10 +26,12 @@ training process with TensorBoard by running tensorboard --logdir=~/ray_results -The ``train.py`` script has a number of options you can show by running +The ``rllib train`` command (same as the ``train.py`` script in the repo) has a number of options you can show by running: .. code-block:: bash + rllib train --help + -or- python ray/python/ray/rllib/train.py --help The most important options are for choosing the environment @@ -42,16 +44,16 @@ Evaluating Trained Agents In order to save checkpoints from which to evaluate agents, set ``--checkpoint-freq`` (number of training iterations between checkpoints) -when running ``train.py``. +when running ``rllib train``. An example of evaluating a previously trained DQN agent is as follows: .. code-block:: bash - python ray/python/ray/rllib/rollout.py \ - ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \ - --run DQN --env CartPole-v0 --steps 10000 + rllib rollout \ + ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \ + --run DQN --env CartPole-v0 --steps 10000 The ``rollout.py`` helper script reconstructs a DQN agent from the checkpoint located at ``~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1`` @@ -70,8 +72,7 @@ In an example below, we train A2C by specifying 8 workers through the config fla .. code-block:: bash - python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \ - --run=A2C --config '{"num_workers": 8}' + rllib train --env=PongDeterministic-v4 --run=A2C --config '{"num_workers": 8}' Specifying Resources ~~~~~~~~~~~~~~~~~~~~ @@ -98,11 +99,11 @@ Some good hyperparameters and settings are available in (some of them are tuned to run on GPUs). If you find better settings or tune an algorithm on a different domain, consider submitting a Pull Request! -You can run these with the ``train.py`` script as follows: +You can run these with the ``rllib train`` command as follows: .. code-block:: bash - python ray/python/ray/rllib/train.py -f /path/to/tuned/example.yaml + rllib train -f /path/to/tuned/example.yaml Python API ---------- @@ -356,7 +357,7 @@ The ``"monitor": true`` config can be used to save Gym episode videos to the res .. code-block:: bash - python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \ + rllib train --env=PongDeterministic-v4 \ --run=A2C --config '{"num_workers": 2, "monitor": true}' # videos will be saved in the ~/ray_results/ dir, for example @@ -372,7 +373,7 @@ You can control the agent log level via the ``"log_level"`` flag. Valid values a .. code-block:: bash - python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \ + rllib train --env=PongDeterministic-v4 \ --run=A2C --config '{"num_workers": 2, "log_level": "DEBUG"}' Stack Traces diff --git a/python/ray/rllib/scripts.py b/python/ray/rllib/scripts.py index cc48b83cf..88d5d5629 100644 --- a/python/ray/rllib/scripts.py +++ b/python/ray/rllib/scripts.py @@ -14,7 +14,7 @@ Example usage for training: rllib train --run DQN --env CartPole-v0 Example usage for rollout: - rllib rollout /tmp/ray/checkpoint_dir/checkpoint-0 --run DQN + rllib rollout /trial_dir/checkpoint_1/checkpoint-1 --run DQN """