mirror of
https://github.com/vale981/ray
synced 2025-03-05 10:01:43 -05:00
[rllib] Document on traj postprocess (#5532)
* document on traj postprocess * shorten it
This commit is contained in:
parent
d41963c546
commit
7d28bbbdbb
2 changed files with 17 additions and 5 deletions
6
.github/PULL_REQUEST_TEMPLATE.md
vendored
6
.github/PULL_REQUEST_TEMPLATE.md
vendored
|
@ -2,11 +2,7 @@
|
|||
|
||||
## Why are these changes needed?
|
||||
|
||||
<!-- Please give a short summary of the problem these changes address. -->
|
||||
|
||||
## What do these changes do?
|
||||
|
||||
<!-- Please give a short summary of these changes. -->
|
||||
<!-- Please give a short summary of the change and the problem this solves. -->
|
||||
|
||||
## Related issue number
|
||||
|
||||
|
|
|
@ -259,6 +259,11 @@ You can provide callback functions to be called at points during policy evaluati
|
|||
print("trainer.train() result: {} -> {} episodes".format(
|
||||
info["trainer"].__name__, info["result"]["episodes_this_iter"]))
|
||||
|
||||
def on_postprocess_traj(info):
|
||||
episode = info["episode"]
|
||||
batch = info["post_batch"] # note: you can mutate this
|
||||
print("postprocessed {} steps".format(batch.count))
|
||||
|
||||
ray.init()
|
||||
analysis = tune.run(
|
||||
"PG",
|
||||
|
@ -269,14 +274,25 @@ You can provide callback functions to be called at points during policy evaluati
|
|||
"on_episode_step": tune.function(on_episode_step),
|
||||
"on_episode_end": tune.function(on_episode_end),
|
||||
"on_train_result": tune.function(on_train_result),
|
||||
"on_postprocess_traj": tune.function(on_postprocess_traj),
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
Visualizing Custom Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Custom metrics can be accessed and visualized like any other training result:
|
||||
|
||||
.. image:: custom_metric.png
|
||||
|
||||
Rewriting Trajectories
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Note that in the ``on_postprocess_batch`` callback you have full access to the trajectory batch (``post_batch``) and other training state. This can be used to rewrite the trajectory, which has a number of uses including:
|
||||
* Backdating rewards to previous time steps (e.g., based on values in ``info``).
|
||||
* Adding model-based curiosity bonuses to rewards (you can train the model with a `custom model supervised loss <rllib-models.html#supervised-model-losses>`__).
|
||||
|
||||
Example: Curriculum Learning
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue