Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. ( #21420 )
2022-01-10 11:22:55 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. ( #21376 )
2022-01-05 18:22:33 +01:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. ( #21330 )
2022-01-05 11:29:44 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). ( #20984 )
2021-12-21 08:39:05 +01:00
Jun Gong
767f78eaf8
[RLlib] Always attach latest eval metrics. ( #21011 )
2021-12-15 11:42:53 +01:00
Sven Mika
db058d0fb3
[RLlib] Rename metrics_smoothing_episodes
into metrics_num_episodes_for_smoothing
for clarity. ( #20983 )
2021-12-11 20:33:35 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. ( #20918 )
2021-12-11 14:57:58 +01:00
Sven Mika
f814c2af89
[RLlib; Docs] Docs API reference pages: rllib/execution
, rllib/evaluation
, rllib/models
, rllib/offline
. ( #20538 )
2021-12-10 09:41:29 +01:00
Sven Mika
b4790900f5
[RLlib] Sub-class Trainer
(instead of build_trainer()
): All remaining classes; soft-deprecate build_trainer
. ( #20725 )
2021-12-04 22:05:26 +01:00
Sven Mika
60b2219d72
[RLlib] Allow for evaluation to run by timesteps
(alternative to episodes
) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. ( #20757 )
2021-12-04 13:26:33 +01:00
Sven Mika
9e38f6f613
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of build_trainer
). ( #20636 )
2021-12-01 10:52:12 +01:00
Sven Mika
3d2e27485b
[RLlib] Trainer sub-class DQN/SimpleQ/APEX-DQN/R2D2 (instead of using build_trainer
). ( #20633 )
2021-11-30 18:05:44 +01:00
Sven Mika
c07d8c4c22
[RLlib] Trainer sub-class A2C/A3C (instead of build_trainer
). ( #20635 )
2021-11-24 22:07:13 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()
). ( #20571 )
2021-11-23 23:01:05 +01:00
Sven Mika
9d2fe5756c
[RLlib] Trainer sub-class for APPO (instead of using build_trainer()
). ( #20424 )
2021-11-22 22:14:21 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. ( #20250 )
2021-11-17 21:40:16 +01:00
Avnish Narayan
dc17f0a241
Add error messages for missing tf and torch imports ( #20205 )
...
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 16:30:53 -08:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
." ( #20285 )
...
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )" (#20284 )"
This reverts commit 246787cdd9
.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Kai Fricke
246787cdd9
Revert "[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )" ( #20284 )
...
This reverts commit 6f85af435f
.
2021-11-12 13:09:43 +00:00
Sven Mika
6f85af435f
[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )
2021-11-11 12:16:20 +01:00
Sven Mika
ebd56b57db
[RLlib; documentation] "RLlib in 60sec" overhaul. ( #20215 )
2021-11-10 22:20:06 +01:00
Kai Fricke
9c2b8c8501
[tune] Deprecate DurableTrainable ( #19880 )
2021-11-08 20:56:07 +00:00
Sven Mika
eea6b40a3e
[RLlib] Minor cleanups in Trainer
; better tf/tf2 info messages about possible tracing speedups. ( #20109 )
2021-11-08 15:37:27 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. ( #19981 )
2021-11-05 16:10:00 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2
. ( #19273 )
...
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7
* fix tune test_sampler::testSampleBoundsAx
* fix re-install ray for py3.7 tests
Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils ( #19829 )
2021-11-01 21:46:02 +01:00
Sven Mika
bab9c0f670
[RLlib; Docs overhaul] Redo: Docstring cleanup: Trainer, trainer_template, Callbacks."" ( #19830 )
2021-11-01 21:45:11 +01:00
Sven Mika
9c73871da0
[RLlib; Docs overhaul] Docstring cleanup: Evaluation ( #19783 )
2021-10-29 12:03:56 +02:00
Sven Mika
4a82d3ea6c
Revert "[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. ( #19758 )" ( #19806 )
...
This reverts commit 80eeb13175
.
2021-10-27 23:30:07 +02:00
Sven Mika
80eeb13175
[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. ( #19758 )
2021-10-27 19:15:35 +02:00
gjoliver
99a0088233
[RLlib] Unify the way we create local replay buffer for all agents ( #19627 )
...
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.
This change
1. Get rid of the try...except clause when we call execution_plan(),
and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
of local replay buffers for all our agents.
E.g., allow us to easily create a reservoir sampling replay buffer for
APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
Sven Mika
fd438d5630
[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. ( #19133 )
2021-10-07 22:39:21 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action
and compute_actions_from_input_dict
. ( #18921 )
2021-09-30 15:03:37 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). ( #18468 )
2021-09-23 12:56:45 +02:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). ( #18669 )
2021-09-21 22:00:14 +02:00
Sven Mika
3f89f35e52
[RLlib] Better error messages and hints; + failure-mode tests; ( #18466 )
2021-09-10 16:52:47 +02:00
Sven Mika
8a066474d4
[RLlib] No Preprocessors; preparatory PR #1 ( #18367 )
2021-09-09 08:10:42 +02:00
Sven Mika
1520c3d147
[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to Trainer.add_policy()
( #18428 )
2021-09-09 07:10:06 +02:00
Sven Mika
56f142cac1
[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). ( #18380 )
2021-09-07 08:08:37 +02:00
Sven Mika
ba58f5edb1
[RLlib] Strictly run evaluation_num_episodes
episodes each evaluation run (no matter the other eval config settings). ( #18335 )
2021-09-05 15:37:05 +02:00
Sven Mika
a772c775cd
[RLlib] Set random seed (if provided) to Trainer process as well. ( #18307 )
2021-09-04 11:02:30 +02:00
Sven Mika
82465f9342
[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. ( #18254 )
2021-08-31 22:03:23 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. ( #18065 )
2021-08-31 14:56:53 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. ( #17999 )
2021-08-31 12:21:49 +02:00
Sven Mika
9883505e84
[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). ( #18017 )
2021-08-24 21:55:27 +02:00
Sven Mika
8248ba531b
[RLlib] Redo #17410 : Example script: Remote worker envs with inference done on main node. ( #17960 )
2021-08-20 08:02:18 +02:00
Alex Wu
318ba6fae0
Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )" ( #17951 )
...
This reverts commit 8fc16b9a18
.
2021-08-19 07:55:10 -07:00
Sven Mika
8fc16b9a18
[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )
2021-08-19 12:14:50 +02:00
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) ( #17895 )
...
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
2021-08-17 09:13:35 -07:00