2017-03-07 23:42:44 -08:00
|
|
|
Policy Gradient Methods
|
|
|
|
=======================
|
|
|
|
|
|
|
|
This code shows how to do reinforcement learning with policy gradient methods.
|
|
|
|
View the `code for this example`_.
|
|
|
|
|
2017-12-23 00:31:33 -08:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
For an overview of Ray's reinforcement learning library, see `Ray RLlib <http://ray.readthedocs.io/en/latest/rllib.html>`__.
|
|
|
|
|
|
|
|
|
2017-03-07 23:42:44 -08:00
|
|
|
To run this example, you will need to install `TensorFlow with GPU support`_ (at
|
|
|
|
least version ``1.0.0``) and a few other dependencies.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
pip install gym[atari]
|
|
|
|
pip install tensorflow
|
|
|
|
|
|
|
|
Then you can run the example as follows.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-11-20 17:52:43 -08:00
|
|
|
python/ray/rllib/train.py --env=Pong-ram-v4 --run=PPO
|
2017-03-07 23:42:44 -08:00
|
|
|
|
2017-10-09 22:58:58 -07:00
|
|
|
This will train an agent on the ``Pong-ram-v4`` Atari environment. You can also
|
2017-03-14 13:31:29 -07:00
|
|
|
try passing in the ``Pong-v0`` environment or the ``CartPole-v0`` environment.
|
|
|
|
If you wish to use a different environment, you will need to change a few lines
|
|
|
|
in ``example.py``.
|
2017-03-07 23:42:44 -08:00
|
|
|
|
2017-05-21 14:51:24 -07:00
|
|
|
Current and historical training progress can be monitored by pointing
|
|
|
|
TensorBoard to the log output directory as follows.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-12-15 14:19:08 -08:00
|
|
|
tensorboard --logdir=~/ray_results
|
2017-05-21 14:51:24 -07:00
|
|
|
|
|
|
|
Many of the TensorBoard metrics are also printed to the console, but you might
|
|
|
|
find it easier to visualize and compare between runs using the TensorBoard UI.
|
|
|
|
|
2017-03-07 23:42:44 -08:00
|
|
|
.. _`TensorFlow with GPU support`: https://www.tensorflow.org/install/
|
2017-10-09 22:58:58 -07:00
|
|
|
.. _`code for this example`: https://github.com/ray-project/ray/tree/master/python/ray/rllib/ppo
|