Sven Mika
73f5c4039b
[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. ( #19035 )
2021-10-04 13:23:51 +02:00
Jiajun Yao
7588bfd315
[Lint] Add flake8-bugbear ( #19053 )
...
* Add flake8-bugbear
* Add flake8-bugbear
2021-10-03 23:24:11 -07:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. ( #18879 )
2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action
and compute_actions_from_input_dict
. ( #18921 )
2021-09-30 15:03:37 +02:00
Avnish Narayan
6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo ( #18959 )
...
* [RLlib] Raise error for kl penalty ddpo
DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)
* Update rllib/agents/ppo/ddppo.py
Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
Sven Mika
9c9b482661
[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. ( #18939 )
2021-09-29 21:31:34 +02:00
Sven Mika
b99943806e
[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). ( #18971 )
2021-09-29 21:30:04 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). ( #18468 )
2021-09-23 12:56:45 +02:00
Sven Mika
a2a077b874
[RLlib] Faster remote worker space inference (don't infer if not required). ( #18805 )
2021-09-23 10:54:37 +02:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). ( #18669 )
2021-09-21 22:00:14 +02:00
Sven Mika
fd13bac9b3
[RLlib] Add worker
arg (optional) to policy_mapping_fn
. ( #18184 )
2021-09-17 12:07:11 +02:00
Sven Mika
ba1c489b79
[RLlib Testing] Lower --smoke-test
"time_total_s" to make sure it doesn't time out. ( #18670 )
2021-09-16 18:22:23 +02:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. ( #18544 )
2021-09-15 08:46:37 +02:00
Sven Mika
08c09737fa
[RLlib] Fix R2D2 (torch) multi-GPU issue. ( #18550 )
2021-09-14 19:58:10 +02:00
Sven Mika
3803e796ff
[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. ( #18540 )
2021-09-13 19:27:53 +02:00
Sven Mika
ea4a22249c
[RLlib] Add simple action-masking example script/env/model (tf and torch). ( #18494 )
2021-09-11 23:08:09 +02:00
Sven Mika
3f89f35e52
[RLlib] Better error messages and hints; + failure-mode tests; ( #18466 )
2021-09-10 16:52:47 +02:00
Sven Mika
8a066474d4
[RLlib] No Preprocessors; preparatory PR #1 ( #18367 )
2021-09-09 08:10:42 +02:00
Sven Mika
1520c3d147
[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to Trainer.add_policy()
( #18428 )
2021-09-09 07:10:06 +02:00
gjoliver
808b683f81
[RLlib] Add a unittest for learning rate schedule used with APEX agent. ( #18389 )
2021-09-08 23:29:40 +02:00
Sven Mika
45f60e51a9
[RLlib] DDPPO fixes and benchmarks. ( #18390 )
2021-09-08 19:39:01 +02:00
Sven Mika
56f142cac1
[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). ( #18380 )
2021-09-07 08:08:37 +02:00
Sven Mika
e3e6ed7aaa
[RLlib] Issues 17844, 18034: Fix n-step > 1 bug. ( #18358 )
2021-09-06 12:14:20 +02:00
Sven Mika
ba58f5edb1
[RLlib] Strictly run evaluation_num_episodes
episodes each evaluation run (no matter the other eval config settings). ( #18335 )
2021-09-05 15:37:05 +02:00
Sven Mika
a772c775cd
[RLlib] Set random seed (if provided) to Trainer process as well. ( #18307 )
2021-09-04 11:02:30 +02:00
Sven Mika
9a8ca6a69d
[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. ( #18306 )
2021-09-03 13:29:57 +02:00
Sven Mika
82465f9342
[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. ( #18254 )
2021-08-31 22:03:23 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. ( #18065 )
2021-08-31 14:56:53 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. ( #17999 )
2021-08-31 12:21:49 +02:00
Joseph Suarez
8136d2912b
[RLlib] Add policies
arg to callback: on_episode_step
(already exists in all other episode-related callbacks) ( #18119 )
2021-08-27 16:12:19 +02:00
Sven Mika
b6aa8223bc
[RLlib] Fix final_scale
's default value to 0.02 (see OrnsteinUhlenbeck exploration). ( #18070 )
2021-08-25 14:22:09 +02:00
Sven Mika
9883505e84
[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). ( #18017 )
2021-08-24 21:55:27 +02:00
Sven Mika
494ddd98c1
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. ( #17928 )
2021-08-21 17:05:48 +02:00
Sven Mika
8248ba531b
[RLlib] Redo #17410 : Example script: Remote worker envs with inference done on main node. ( #17960 )
2021-08-20 08:02:18 +02:00
Alex Wu
318ba6fae0
Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )" ( #17951 )
...
This reverts commit 8fc16b9a18
.
2021-08-19 07:55:10 -07:00
Sven Mika
8fc16b9a18
[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )
2021-08-19 12:14:50 +02:00
Kai Fricke
bf3eaa9264
[RLlib] Dreamer fixes and reinstate Dreamer test. ( #17821 )
...
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-08-18 18:47:08 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. ( #17778 )
2021-08-18 17:21:01 +02:00
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) ( #17895 )
...
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
2021-08-17 09:13:35 -07:00
Thomas Lecat
c02f91fa2d
[RLlib] Ape-X doesn't take the value of prioritized_replay
into account ( #17541 )
2021-08-16 22:18:08 +02:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. ( #17640 )
2021-08-16 22:01:01 +02:00
Sven Mika
c2ea2c01bb
[RLlib] Redo: Add support for multi-GPU to DDPG. ( #17789 )
...
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
2021-08-13 18:01:24 -07:00
Sven Mika
7f2b3c0824
[RLlib] Issue 17667: CQL-torch + GPU not working (due to simple_optimizer=False; must use simple optimizer!). ( #17742 )
2021-08-11 18:30:21 +02:00
Sven Mika
811d71b368
[RLlib] Issue 17653: Torch multi-GPU (>1) broken for LSTMs. ( #17657 )
2021-08-11 12:44:35 +02:00
Amog Kamsetty
0b8489dcc6
Revert "[RLlib] Add support for multi-GPU to DDPG. ( #17586 )" ( #17707 )
...
This reverts commit 0eb0e0ff58
.
2021-08-10 10:50:21 -07:00
Amog Kamsetty
77f28f1c30
Revert "[RLlib] Fix Trainer.add_policy
for num_workers>0 (self play example scripts). ( #17566 )" ( #17709 )
...
This reverts commit 3b447265d8
.
2021-08-10 10:50:01 -07:00
Sven Mika
3b447265d8
[RLlib] Fix Trainer.add_policy
for num_workers>0 (self play example scripts). ( #17566 )
2021-08-05 11:41:18 -04:00
Sven Mika
0eb0e0ff58
[RLlib] Add support for multi-GPU to DDPG. ( #17586 )
2021-08-05 11:39:51 -04:00
Sven Mika
5107d16ae5
[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. ( #17530 )
2021-08-03 18:30:02 -04:00
Sven Mika
924f11cd45
[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). ( #17371 )
2021-08-03 11:35:49 -04:00