ray/doc/source/example-policy-gradient.rst

Policy Gradient Methods
=======================

This code shows how to do reinforcement learning with policy gradient methods.
View the `code for this example`_.

To run this example, you will need to install `TensorFlow with GPU support`_ (at
least version ``1.0.0``) and a few other dependencies.

.. code-block:: bash

  pip install gym[atari]
  pip install tensorflow

Then install the package as follows.

.. code-block:: bash

  cd ray/examples/policy_gradient/
  python setup.py install

Then you can run the example as follows.

.. code-block:: bash

  python/ray/rllib/policy_gradient/example.py --environment=Pong-ram-v3

This will train an agent on the ``Pong-ram-v3`` Atari environment. You can also
try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
If you wish to use a different environment, you will need to change a few lines
in ``example.py``.

Current and historical training progress can be monitored by pointing
TensorBoard to the log output directory as follows.

.. code-block:: bash

  tensorboard --logdir=/tmp/ray

Many of the TensorBoard metrics are also printed to the console, but you might
find it easier to visualize and compare between runs using the TensorBoard UI.

.. _`TensorFlow with GPU support`: https://www.tensorflow.org/install/
.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/policy_gradient