The dict merge prevents crashes when tune is trying to get resource requests for agents and you override a config subkey. The min iter time prevents iterations from getting too small, incurring high overhead. This is easy to run into on Ape-X since throughput can get very high.
We should use episode ids instead of the timestep to determine when sequences should be cut, since when batches are concatenated, increasing t does not guarantee we are part of the same episode.
* Prevent hasher from running out of memory on large files
* dump out keys
* only print if failed
* remove debugging
* Fix lint error. Reverse adding newline.
* Raise application level exception for actor methods that can't be executed and failed tasks.
* Retry task forwarding for actor tasks.
* Small cleanups
* Move constant to ray_config.
* Create ForwardTaskOrResubmit method.
* Minor
* Clean up queued tasks for dead actors.
* Some cleanups.
* Linting
* Notify task_dependency_manager_ about failed tasks.
* Manage timer lifetime better.
* Use smart pointers to deallocate the timer.
* Fix
* add comment
Using the actual batch size reduces the risk of mis-accounting. Here, we under-counted samples since in truncate_episodes mode we were doubling the batch size by accident in policy_evaluator.
This adds a simple DQN+PPO example for multi-agent. We don't do anything fancy here, just syncing weights between two separate trainers. This potentially is wasting some compute, but is very simple to set up.
It might be nice to share experience collection between the top-level trainers in the future.
* Use absolute path to get to thirdparty dir
In case this script is executed from a different directory than the Ray's directory, the `pushd` will fail. This commit uses absolute path to `thirdparty` directory.
* Update setup_thirdparty.sh
Cleanup: TFPolicyGraph now automatically adds loss input entries for state_in_*, so that graph sub-classes don't need to worry about it.
Multi-GPU support:
Allow setting up model tower replicas with existing state input tensors
Truncate the per-device minibatch slices so that they are always a multiple of max_seq_len.
* Saving work on parameter server blog post.
* Updates
* Updates to blog post.
* Add notes about tasks and actors.
* Updates
* Add RLlib paper link
* Update intro
* Address comments.
* More fixes.
* Clarify ray.get
* Change date
* Add @ray.remote clarification.
* Update site deployment instructions.
* Minor wording