ray/doc/source/example-policy-gradient.rst

Policy Gradient Methods
=======================

This code shows how to do reinforcement learning with policy gradient methods.
View the `code for this example`_.

.. note::

    For an overview of Ray's reinforcement learning library, see `Ray RLlib <http://ray.readthedocs.io/en/latest/rllib.html>`__.


To run this example, you will need to install `TensorFlow with GPU support`_ (at
least version ``1.0.0``) and a few other dependencies.

.. code-block:: bash

  pip install gym[atari]
  pip install tensorflow

Then you can run the example as follows.

.. code-block:: bash

  python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO

This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
If you wish to use a different environment, you will need to change a few lines
in ``example.py``.

Current and historical training progress can be monitored by pointing
TensorBoard to the log output directory as follows.

.. code-block:: bash

  tensorboard --logdir=~/ray_results

Many of the TensorBoard metrics are also printed to the console, but you might
find it easier to visualize and compare between runs using the TensorBoard UI.

.. _`TensorFlow with GPU support`: https://www.tensorflow.org/install/
.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/ppo
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00			`Policy Gradient Methods`
			`=======================`

			`This code shows how to do reinforcement learning with policy gradient methods.`
			View the `code for this example`_.

[docs] Add backlinks from hyperopt / rl algorithm examples to the built-on Ray libraries (#1356) 2017-12-23 00:31:33 -08:00			`.. note::`

			For an overview of Ray's reinforcement learning library, see `Ray RLlib <http://ray.readthedocs.io/en/latest/rllib.html>`__.


Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00			To run this example, you will need to install `TensorFlow with GPU support`_ (at
			least version ``1.0.0``) and a few other dependencies.

			`.. code-block:: bash`

			`pip install gym[atari]`
			`pip install tensorflow`

			`Then you can run the example as follows.`

			`.. code-block:: bash`

[tune] Support user-defined trainable functions / classes / envs with a shared object registry (#1226) 2017-11-20 17:52:43 -08:00			`python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO`
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00
[rllib] Fix docs to reference new code locations (#1092) * fix rllib docs * Update example-a3c.rst 2017-10-09 22:58:58 -07:00			This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
Atari on pixels (#364) * pong on pixels working (not cleaned up) * make training compatible with all atari games * cartpole runs * Update documentation and usage for policy gradients. 2017-03-14 13:31:29 -07:00			try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
			`If you wish to use a different environment, you will need to change a few lines`
			in ``example.py``.
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00
Policy gradient example: record stats for tensorboard (#577) * add tf metrics * comments * fix network scopes * add doc * use format string * fix trace level * plot intermediate and final sgd stats * add back a global step 2017-05-21 14:51:24 -07:00			`Current and historical training progress can be monitored by pointing`
			`TensorBoard to the log output directory as follows.`

			`.. code-block:: bash`

[tune] Clean up result logging: move out of /tmp, add timestamp (#1297) 2017-12-15 14:19:08 -08:00			`tensorboard --logdir=~/ray_results`
Policy gradient example: record stats for tensorboard (#577) * add tf metrics * comments * fix network scopes * add doc * use format string * fix trace level * plot intermediate and final sgd stats * add back a global step 2017-05-21 14:51:24 -07:00
			`Many of the TensorBoard metrics are also printed to the console, but you might`
			`find it easier to visualize and compare between runs using the TensorBoard UI.`

Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00			.. _`TensorFlow with GPU support`: https://www.tensorflow.org/install/
[rllib] Fix docs to reference new code locations (#1092) * fix rllib docs * Update example-a3c.rst 2017-10-09 22:58:58 -07:00			.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/ppo