ray/doc/source/rllib-training.rst

RLlib Training APIs
===================

Getting Started
---------------

At a high level, RLlib provides an ``Agent`` class which
holds a policy for environment interaction. Through the agent interface, the policy can
be trained, checkpointed, or an action computed.

.. image:: rllib-api.svg

You can train a simple DQN agent with the following command

.. code-block:: bash

    python ray/python/ray/rllib/train.py --run DQN --env CartPole-v0

By default, the results will be logged to a subdirectory of ``~/ray_results``.
This subdirectory will contain a file ``params.json`` which contains the
hyperparameters, a file ``result.json`` which contains a training summary
for each episode and a TensorBoard file that can be used to visualize
training process with TensorBoard by running

.. code-block:: bash

     tensorboard --logdir=~/ray_results


The ``train.py`` script has a number of options you can show by running

.. code-block:: bash

    python ray/python/ray/rllib/train.py --help

The most important options are for choosing the environment
with ``--env`` (any OpenAI gym environment including ones registered by the user
can be used) and for choosing the algorithm with ``--run``
(available options are ``PPO``, ``PG``, ``A3C``, ``ES``, ``DDPG``, ``DDPG2``, ``DQN``, ``APEX``, and ``APEX_DDPG``).

Specifying Parameters
~~~~~~~~~~~~~~~~~~~~~

Each algorithm has specific hyperparameters that can be set with ``--config``. See the
`algorithms documentation <rllib-algorithms.html>`__ for more information.

In an example below, we train A3C by specifying 8 workers through the config flag.
function that creates the env to refer to it by name. The contents of the env_config agent config field will be passed to that function to allow the environment to be configured. The return type should be an OpenAI gym.Env. For example:


.. code-block:: bash

    python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \
        --run=A3C --config '{"num_workers": 8}'

Evaluating Trained Agents
~~~~~~~~~~~~~~~~~~~~~~~~~

In order to save checkpoints from which to evaluate agents,
set ``--checkpoint-freq`` (number of training iterations between checkpoints)
when running ``train.py``.


An example of evaluating a previously trained DQN agent is as follows:

.. code-block:: bash

    python ray/python/ray/rllib/rollout.py \
          ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint-1 \
          --run DQN --env CartPole-v0

The ``rollout.py`` helper script reconstructs a DQN agent from the checkpoint
located at ``~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint-1``
and renders its behavior in the environment specified by ``--env``.

Tuned Examples
--------------

Some good hyperparameters and settings are available in
`the repository <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples>`__
(some of them are tuned to run on GPUs). If you find better settings or tune
an algorithm on a different domain, consider submitting a Pull Request!

Python API
----------

The Python API provides the needed flexibility for applying RLlib to new problems. You will need to use this API if you wish to use custom environments, preprocesors, or models with RLlib.

Here is an example of the basic usage:

.. code-block:: python

    import ray
    import ray.rllib.agents.ppo as ppo

    ray.init()
    config = ppo.DEFAULT_CONFIG.copy()
    agent = ppo.PPOAgent(config=config, env="CartPole-v0")

    # Can optionally call agent.restore(path) to load a checkpoint.

    for i in range(1000):
       # Perform one iteration of training the policy with PPO
       result = agent.train()
       print("result: {}".format(result))

       if i % 100 == 0:
           checkpoint = agent.save()
           print("checkpoint saved at", checkpoint)

All RLlib agents implement the tune Trainable API, which means they support incremental training and checkpointing. This enables them to be easily used in experiments with Ray Tune.

Accessing Global State
~~~~~~~~~~~~~~~~~~~~~~
It is common to need to access an agent's internal state, e.g., to set or get internal weights. In RLlib an agent's state is replicated across multiple *policy evaluators* (Ray actors) in the cluster. However, you can easily get and update this state between calls to ``train()`` via ``agent.optimizer.foreach_evaluator()`` or ``agent.optimizer.foreach_evaluator_with_index()``. These functions take a lambda function that is applied with the evaluator as an arg. You can also return values from these functions and those will be returned as a list.

You can also access just the "master" copy of the agent state through ``agent.optimizer.local_evaluator``, but note that updates here may not be reflected in remote replicas if you have configured ``num_workers > 0``.

REST API
--------

In some cases (i.e., when interacting with an external environment) it makes more sense to interact with RLlib as if were an independently running service, rather than RLlib hosting the simulations itself. This is possible via RLlib's serving env `interface <rllib-envs.html#serving>`__.

.. autoclass:: ray.rllib.utils.policy_client.PolicyClient
    :members:

.. autoclass:: ray.rllib.utils.policy_server.PolicyServer
    :members:

For a full client / server example that you can run, see the example `client script <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/serving/cartpole_client.py>`__ and also the corresponding `server script <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/serving/cartpole_server.py>`__, here configured to serve a policy for the toy CartPole-v0 environment.
[rllib] Document "v2" APIs (#2316) * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * wip * wip * cast * wip * works * fix a3c * works * lstm util test * doc * clean up * update * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * envs * vec * doc prep * models * rl * alg * up * clarify * copy * async sa * fix * comments * fix a3c conf * tune lstm * fix reshape * fix * back to 16 * tuned a3c update * update * tuned * optional * merge * wip * fix up * move pg class * rename env * wip * update * tip * alg * readme * fix catalog * readme * doc * context * remove prep * comma * add env * link to paper * paper * update * rnn * update * wip * clean up ev creation * fix * fix * fix * fix lint * up * no comma * ma * Update run_multi_node_tests.sh * fix * sphinx is stupid * sphinx is stupid * clarify torch graph * no horizon * fix config * sb * Update test_optimizers.py 2018-07-01 00:05:08 -07:00			`RLlib Training APIs`
			`===================`

			`Getting Started`
			`---------------`

			At a high level, RLlib provides an ``Agent`` class which
			`holds a policy for environment interaction. Through the agent interface, the policy can`
			`be trained, checkpointed, or an action computed.`

			`.. image:: rllib-api.svg`

			`You can train a simple DQN agent with the following command`

			`.. code-block:: bash`

			`python ray/python/ray/rllib/train.py --run DQN --env CartPole-v0`

			By default, the results will be logged to a subdirectory of ``~/ray_results``.
			This subdirectory will contain a file ``params.json`` which contains the
			hyperparameters, a file ``result.json`` which contains a training summary
			`for each episode and a TensorBoard file that can be used to visualize`
			`training process with TensorBoard by running`

			`.. code-block:: bash`

			`tensorboard --logdir=~/ray_results`


			The ``train.py`` script has a number of options you can show by running

			`.. code-block:: bash`

			`python ray/python/ray/rllib/train.py --help`

			`The most important options are for choosing the environment`
			with ``--env`` (any OpenAI gym environment including ones registered by the user
			can be used) and for choosing the algorithm with ``--run``
			(available options are ``PPO``, ``PG``, ``A3C``, ``ES``, ``DDPG``, ``DDPG2``, ``DQN``, ``APEX``, and ``APEX_DDPG``).

			`Specifying Parameters`
			`~~~~~~~~~~~~~~~~~~~~~`

			Each algorithm has specific hyperparameters that can be set with ``--config``. See the
			`algorithms documentation <rllib-algorithms.html>`__ for more information.

			`In an example below, we train A3C by specifying 8 workers through the config flag.`
			`function that creates the env to refer to it by name. The contents of the env_config agent config field will be passed to that function to allow the environment to be configured. The return type should be an OpenAI gym.Env. For example:`


			`.. code-block:: bash`

			`python ray/python/ray/rllib/train.py --env=PongDeterministic-v4 \`
			`--run=A3C --config '{"num_workers": 8}'`

			`Evaluating Trained Agents`
			`~~~~~~~~~~~~~~~~~~~~~~~~~`

			`In order to save checkpoints from which to evaluate agents,`
			set ``--checkpoint-freq`` (number of training iterations between checkpoints)
			when running ``train.py``.


			`An example of evaluating a previously trained DQN agent is as follows:`

			`.. code-block:: bash`

			`python ray/python/ray/rllib/rollout.py \`
			`~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint-1 \`
			`--run DQN --env CartPole-v0`

			The ``rollout.py`` helper script reconstructs a DQN agent from the checkpoint
			located at ``~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint-1``
			and renders its behavior in the environment specified by ``--env``.

			`Tuned Examples`
			`--------------`

			`Some good hyperparameters and settings are available in`
			`the repository <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples>`__
			`(some of them are tuned to run on GPUs). If you find better settings or tune`
			`an algorithm on a different domain, consider submitting a Pull Request!`

			`Python API`
			`----------`

			`The Python API provides the needed flexibility for applying RLlib to new problems. You will need to use this API if you wish to use custom environments, preprocesors, or models with RLlib.`

			`Here is an example of the basic usage:`

			`.. code-block:: python`

			`import ray`
			`import ray.rllib.agents.ppo as ppo`

			`ray.init()`
			`config = ppo.DEFAULT_CONFIG.copy()`
			`agent = ppo.PPOAgent(config=config, env="CartPole-v0")`

			`# Can optionally call agent.restore(path) to load a checkpoint.`

			`for i in range(1000):`
			`# Perform one iteration of training the policy with PPO`
			`result = agent.train()`
			`print("result: {}".format(result))`

			`if i % 100 == 0:`
			`checkpoint = agent.save()`
			`print("checkpoint saved at", checkpoint)`

			`All RLlib agents implement the tune Trainable API, which means they support incremental training and checkpointing. This enables them to be easily used in experiments with Ray Tune.`

			`Accessing Global State`
			`~~~~~~~~~~~~~~~~~~~~~~`
			It is common to need to access an agent's internal state, e.g., to set or get internal weights. In RLlib an agent's state is replicated across multiple policy evaluators (Ray actors) in the cluster. However, you can easily get and update this state between calls to ``train()`` via ``agent.optimizer.foreach_evaluator()`` or ``agent.optimizer.foreach_evaluator_with_index()``. These functions take a lambda function that is applied with the evaluator as an arg. You can also return values from these functions and those will be returned as a list.

			You can also access just the "master" copy of the agent state through ``agent.optimizer.local_evaluator``, but note that updates here may not be reflected in remote replicas if you have configured ``num_workers > 0``.

			`REST API`
			`--------`

			In some cases (i.e., when interacting with an external environment) it makes more sense to interact with RLlib as if were an independently running service, rather than RLlib hosting the simulations itself. This is possible via RLlib's serving env `interface <rllib-envs.html#serving>`__.

			`.. autoclass:: ray.rllib.utils.policy_client.PolicyClient`
			`:members:`

			`.. autoclass:: ray.rllib.utils.policy_server.PolicyServer`
			`:members:`

			For a full client / server example that you can run, see the example `client script <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/serving/cartpole_client.py>`__ and also the corresponding `server script <https://github.com/ray-project/ray/blob/master/python/ray/rllib/examples/serving/cartpole_server.py>`__, here configured to serve a policy for the toy CartPole-v0 environment.