[rllib] Document on traj postprocess (#5532)

* document on traj postprocess

* shorten it
This commit is contained in:
Eric Liang 2019-08-24 20:37:45 -07:00 committed by GitHub
parent d41963c546
commit 7d28bbbdbb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 17 additions and 5 deletions

View file

@ -2,11 +2,7 @@
## Why are these changes needed?
<!-- Please give a short summary of the problem these changes address. -->
## What do these changes do?
<!-- Please give a short summary of these changes. -->
<!-- Please give a short summary of the change and the problem this solves. -->
## Related issue number

View file

@ -259,6 +259,11 @@ You can provide callback functions to be called at points during policy evaluati
print("trainer.train() result: {} -> {} episodes".format(
info["trainer"].__name__, info["result"]["episodes_this_iter"]))
def on_postprocess_traj(info):
episode = info["episode"]
batch = info["post_batch"] # note: you can mutate this
print("postprocessed {} steps".format(batch.count))
ray.init()
analysis = tune.run(
"PG",
@ -269,14 +274,25 @@ You can provide callback functions to be called at points during policy evaluati
"on_episode_step": tune.function(on_episode_step),
"on_episode_end": tune.function(on_episode_end),
"on_train_result": tune.function(on_train_result),
"on_postprocess_traj": tune.function(on_postprocess_traj),
},
},
)
Visualizing Custom Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~
Custom metrics can be accessed and visualized like any other training result:
.. image:: custom_metric.png
Rewriting Trajectories
~~~~~~~~~~~~~~~~~~~~~~
Note that in the ``on_postprocess_batch`` callback you have full access to the trajectory batch (``post_batch``) and other training state. This can be used to rewrite the trajectory, which has a number of uses including:
* Backdating rewards to previous time steps (e.g., based on values in ``info``).
* Adding model-based curiosity bonuses to rewards (you can train the model with a `custom model supervised loss <rllib-models.html#supervised-model-losses>`__).
Example: Curriculum Learning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~