Sven Mika
9c73871da0
[RLlib; Docs overhaul] Docstring cleanup: Evaluation ( #19783 )
2021-10-29 12:03:56 +02:00
gjoliver
d81885c1f1
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests ( #19809 )
...
* Fix DDPG, since it is based on GenericOffPolicyTrainer.
* Fix QMix, SAC, and MADDPA too.
* Undo QMix change.
* Fix DQN input batch type. Always use SampleBatch.
* apex ddpg should not use replay_buffer_config yet.
* Make eager tf policy to use SampleBatch.
* lint
* LINT.
* Re-enable RLlib broken tests to make sure things work ok now.
* fixes.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 18:06:47 +02:00
gjoliver
99a0088233
[RLlib] Unify the way we create local replay buffer for all agents ( #19627 )
...
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.
This change
1. Get rid of the try...except clause when we call execution_plan(),
and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
of local replay buffers for all our agents.
E.g., allow us to easily create a reservoir sampling replay buffer for
APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
gjoliver
c3c42278e4
[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings ( #19652 )
...
* [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings.
* wip
2021-10-25 09:38:56 +02:00
Sven Mika
1f0646f658
[RLlib] Issue 18418: SAC w/ dict space not working. ( #19101 )
2021-10-06 09:05:50 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. ( #18937 )
2021-10-04 13:29:00 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. ( #18879 )
2021-09-30 16:39:05 +02:00
Sven Mika
9c9b482661
[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. ( #18939 )
2021-09-29 21:31:34 +02:00
Sven Mika
ba1c489b79
[RLlib Testing] Lower --smoke-test
"time_total_s" to make sure it doesn't time out. ( #18670 )
2021-09-16 18:22:23 +02:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. ( #18544 )
2021-09-15 08:46:37 +02:00
Sven Mika
e3e6ed7aaa
[RLlib] Issues 17844, 18034: Fix n-step > 1 bug. ( #18358 )
2021-09-06 12:14:20 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. ( #18065 )
2021-08-31 14:56:53 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. ( #17999 )
2021-08-31 12:21:49 +02:00
Sven Mika
494ddd98c1
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. ( #17928 )
2021-08-21 17:05:48 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. ( #17778 )
2021-08-18 17:21:01 +02:00
Sven Mika
924f11cd45
[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). ( #17371 )
2021-08-03 11:35:49 -04:00
Sven Mika
8a844ff840
[RLlib] Issues: 17397, 17425, 16715, 17174. When on driver, Torch|TFPolicy should not use ray.get_gpu_ids()
(b/c no GPUs assigned by ray). ( #17444 )
2021-08-02 17:29:59 -04:00
Julius Frost
d7a5ec1830
[RLlib] SAC tuple observation space fix ( #17356 )
2021-07-28 12:39:28 -04:00
Sven Mika
90b21ce27e
[RLlib] De-flake 3 test cases; Fix config.simple_optimizer
and SampleBatch.is_training
warnings. ( #17321 )
2021-07-27 14:39:06 -04:00
ddworak94
fba8461663
[RLlib] Add RNN-SAC agent ( #16577 )
...
Shoutout to @ddworak94 :)
2021-07-25 10:04:52 -04:00
Julius Frost
0b1b6222bc
[rllib] Add merge_trainer_config arguments to trainer template ( #17160 )
2021-07-21 15:43:06 -07:00
Sven Mika
5a313ba3d6
[RLlib] Refactor: All tf static graph code should reside inside Policy class. ( #17169 )
2021-07-20 14:58:13 -04:00
Sven Mika
1fd0eb805e
[RLlib] Redo fix bug normalize vs unsquash actions (original PR made log-likelihood test flakey). ( #17014 )
2021-07-13 14:01:30 -04:00
Amog Kamsetty
bc33dc7e96
Revert "[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action
, not normalize_action
." ( #17002 )
...
This reverts commit 7862dd64ea
.
2021-07-12 11:09:14 -07:00
Sven Mika
7862dd64ea
[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action
, not normalize_action
. ( #16774 )
2021-07-08 17:31:34 +02:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes ( #16531 )
2021-06-30 12:32:11 +02:00
Sven Mika
2900a06dd7
[RLlib] Issue 14503: SAC not allowing custom action distributions. ( #16427 )
2021-06-18 17:27:29 +02:00
Sven Mika
839fc59224
[RLlib] CQL TensorFlow support ( #15841 )
2021-05-18 11:10:46 +02:00
Sven Mika
bc09e75b78
[RLlib] Fix 3 flakey test cases. ( #15785 )
2021-05-16 12:20:33 +02:00
SebastianBo1995
f5be8d8f74
[Rllib] Offline Learning Bug, different shapes ( #15132 )
2021-04-27 17:18:17 +02:00
Sven Mika
bb8a286cbc
[RLlib] Support native tf.keras.Model (milestone toward obsoleting ModelV2 class). ( #14684 )
2021-04-27 10:44:54 +02:00
Sven Mika
cecfc3b43b
[RLlib] Multi-GPU support for Torch algorithms. ( #14709 )
2021-04-16 09:16:24 +02:00
Sven Mika
9c5a0cfd7a
[RLlib] Issue 14385: Policy.compute_actions_from_input_dict
does not properly track accessed fields for Policy's view requirements. ( #14386 )
2021-04-11 18:20:04 +02:00
Raphael CHEN
93d4244d9c
[RLlib] Correctly get bytes size of SampleBatch ( #14801 )
2021-03-30 19:24:58 +02:00
Sven Mika
4f66309e19
[RLlib] Redo issue 14533 tf enable eager exec ( #14984 )
2021-03-29 20:07:44 +02:00
SangBin Cho
fa5f961d5e
Revert "[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )" ( #14918 )
...
This reverts commit 3e389d5812
.
2021-03-25 00:42:01 -07:00
astronauti
8874ccec2d
[RLlib] Update sac_tf_policy.py (add tf.cast to float32 for rewards) ( #14843 )
2021-03-24 16:12:55 +01:00
Sven Mika
3e389d5812
[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )
2021-03-24 12:54:27 +01:00
Sven Mika
04bc0a9828
[RLlib] Remove all non-trajectory view API code. ( #14860 )
2021-03-23 09:50:18 -07:00
Sven Mika
69202c6a7d
[RLlib] Obsolete usage tracking dict via sample batch. ( #13065 )
2021-03-17 08:18:15 +01:00
Sven Mika
732197e23a
[RLlib] Multi-GPU for tf-DQN/PG/A2C. ( #13393 )
2021-03-08 15:41:27 +01:00
Sven Mika
ef944bc5f0
[RLlib] Re-enable placement group support for RLlib. ( #14384 )
2021-03-05 08:16:24 +01:00
Richard Liaw
a2d2275ee1
Revert "[RLlib + Tune] Add placement group support to RLlib. ( #14289 )" ( #14360 )
...
This reverts commit 6cd0cd3bd9
.
2021-02-25 14:27:35 -08:00
Sven Mika
6cd0cd3bd9
[RLlib + Tune] Add placement group support to RLlib. ( #14289 )
2021-02-25 16:01:31 +01:00
Sven Mika
a2f7998026
[RLlib] Issue #13342 : Add validate_spaces
to MB-MPO. ( #14038 )
2021-02-11 11:36:53 +01:00
Sven Mika
37c7daa3c0
[RLlib] DDPG: Support simplex action space. ( #14011 )
2021-02-10 15:10:01 +01:00
Sven Mika
eb0038612f
[RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. ( #13584 )
2021-02-08 15:02:19 +01:00
Sven Mika
52c94b7ee9
[RLlib] Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. ( #13522 )
2021-02-02 13:05:58 +01:00
Sven Mika
2e3655e8a9
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. ( #13238 )
2021-01-19 14:22:36 +01:00
Sven Mika
56878221ed
[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). ( #13363 )
2021-01-14 14:44:33 +01:00