Sven Mika
|
0b3a79ca41
|
[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740)
|
2022-04-07 10:16:22 +02:00 |
|
Sven Mika
|
e391b624f0
|
[RLlib] Re-enable (for CI-testing) our two self_play example scripts. (#23742)
|
2022-04-07 08:20:48 +02:00 |
|
Sven Mika
|
434265edd0
|
[RLlib] Examples folder: All training_iteration translations. (#23712)
|
2022-04-05 16:33:50 +02:00 |
|
Steven Morad
|
39841b65b3
|
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function (#23671)
|
2022-04-05 08:40:29 +02:00 |
|
mesjou
|
e725472b5b
|
[RLlib] Fix bug in prisoners dillemma example. (#23690)
|
2022-04-05 08:36:20 +02:00 |
|
Jiajun Yao
|
5f37231842
|
Remove yapf dependency (#23656)
Yapf has been replaced by black.
|
2022-04-04 21:50:04 -07:00 |
|
Sven Mika
|
0bb82f29b6
|
[RLlib] AlphaStar polishing (fix logger.info bug). (#22281)
|
2022-04-01 09:49:41 +02:00 |
|
Sven Mika
|
2eaa54bd76
|
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491)
|
2022-03-31 18:26:12 +02:00 |
|
simonsays1980
|
9ca9c67bc9
|
[RLlib] Added dtype safeguards to the 'required_model_output_shape()' methods… (#23490)
|
2022-03-31 13:52:00 +02:00 |
|
simonsays1980
|
e4c6e9c3d3
|
[RLlib] Changed the if-block in the example callback to become more readable. (#22900)
|
2022-03-31 09:13:04 +02:00 |
|
simonsays1980
|
d2a3948845
|
[RLlib] Removed the sampler() function in the ParallelRollouts() as it is no needed. (#22320)
|
2022-03-31 09:06:30 +02:00 |
|
Artur Niederfahrenhorst
|
9a64bd4e9b
|
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842)
|
2022-03-29 14:44:40 +02:00 |
|
Jun Gong
|
a7e5aa8c6a
|
[RLlib] Delete some unused confusing logics. (#23513)
|
2022-03-29 13:45:13 +02:00 |
|
Artur Niederfahrenhorst
|
32ad6c6ef1
|
[RLlib] Replay Buffer capacity check (#23523)
|
2022-03-29 12:06:27 +02:00 |
|
Kai Fricke
|
262d6121bb
|
[rllib] Fix error messages and example for dataset writer (#23419)
Currently the error message and example refer to a field type that is actually format.
|
2022-03-28 19:53:12 +01:00 |
|
Sven Mika
|
7cb86acce2
|
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488)
|
2022-03-25 18:25:51 +01:00 |
|
Max Pumperla
|
60054995e6
|
[docs] fix doctests and activate CI (#23418)
|
2022-03-24 17:04:02 -07:00 |
|
Sven Mika
|
22c9c4aa39
|
[RLlib] Slate-Q +GPU torch bug fix. (#23464)
|
2022-03-24 17:39:33 +01:00 |
|
Avnish Narayan
|
5134e0dc12
|
[RLlib] Change type to tensortype for cql policies. (#23438)
|
2022-03-24 12:32:29 +01:00 |
|
Fabian Witter
|
2547055f38
|
[RLlib] Add support for complex observations in CQL (#23332)
|
2022-03-22 17:04:07 +01:00 |
|
Jun Gong
|
d12977c4fb
|
[RLlib] TF2 Bandit Agent (#22838)
|
2022-03-21 16:55:55 +01:00 |
|
Sven Mika
|
b1cda46681
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
Fabien Couthouis
|
e575ed3350
|
[RLlib] Fix AttributeError with None obs shape + tf in _unpack_obs() utility (#22428)
|
2022-03-15 16:34:31 +01:00 |
|
Jeroen Bédorf
|
bc21a4593d
|
[RLlib] Fix crash when kl_coeff is set to 0 (#23063)
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
|
2022-03-11 12:24:52 -08:00 |
|
simonsays1980
|
8627f44d7f
|
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes (#22152)
|
2022-03-09 16:51:42 +01:00 |
|
Artur Niederfahrenhorst
|
37d129a965
|
[RLlib] ReplayBuffer API: Test cases. (#22390)
|
2022-03-08 16:54:12 +01:00 |
|
Artur Niederfahrenhorst
|
c0ade5f0b7
|
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657)
|
2022-03-08 14:25:48 +01:00 |
|
Jiajun Yao
|
4801e57c77
|
[Test] Add missing tests to bazel BUILD (#22827)
|
2022-03-07 19:54:49 -08:00 |
|
Sven Mika
|
3fe6f3b3eb
|
[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821)
|
2022-03-04 19:16:30 +01:00 |
|
Jun Gong
|
e765915ded
|
[RLlib] Make sure SlateQ works with GPU. (#22738)
|
2022-03-04 17:49:51 +01:00 |
|
Kai Fricke
|
84a163a2c4
|
[RLlib] Remove atari rom install script (#22797)
|
2022-03-03 16:55:56 +01:00 |
|
Sven Mika
|
0af100ffae
|
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]) . (#22731)
|
2022-03-01 21:24:24 +01:00 |
|
Sven Mika
|
e50bd212a1
|
[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686)
|
2022-03-01 16:44:17 +01:00 |
|
Daniel
|
8d1f1b0a64
|
[RLlib] Update pettingzoo==1.15.0 supersuit==3.3.3 (#22519)
|
2022-03-01 11:23:27 +01:00 |
|
simonsays1980
|
568cf28dd4
|
[RLlib] Example script custom_metrics_and_callbacks.py should work for batch_mode=complete_episodes . (#22684)
|
2022-03-01 09:00:38 +01:00 |
|
Jun Gong
|
e8be45065e
|
[RLlib] Restore policies on eval_workers as well. (#22641)
|
2022-03-01 08:38:14 +01:00 |
|
Jun Gong
|
22bc451102
|
[RLlib] Fix a memeory leak in SimpleReplyBuffer that completely kills sampling throughput (#22678)
|
2022-02-28 09:28:04 +01:00 |
|
Sven Mika
|
7b687e6cd8
|
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544)
|
2022-02-25 21:58:16 +01:00 |
|
Jun Gong
|
a385c9b127
|
[RLlib] Update bandit_envs_recommender_system (#22421)
|
2022-02-24 22:43:41 +01:00 |
|
Sven Mika
|
526fd6b5fb
|
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590)
|
2022-02-24 22:05:36 +01:00 |
|
Sven Mika
|
18c269c70e
|
[RLlib] Issue 22539: agent_key not deleted from 2 dicts in simple list collector. (#22587)
|
2022-02-24 11:58:34 +01:00 |
|
Sven Mika
|
8e00537b65
|
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543)
|
2022-02-23 13:03:45 +01:00 |
|
Xuehai Pan
|
018ebbf4cb
|
[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy while using multi-GPU optimizers (#21697)
|
2022-02-23 08:30:38 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Jun Gong
|
2b6a0c71d7
|
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init . (#22493)
|
2022-02-22 08:18:32 +01:00 |
|
Steven Morad
|
d4571741aa
|
[RLlib] seq_lens should always be torch tensors. (#22398)
|
2022-02-22 08:15:43 +01:00 |
|
JYX
|
49d7ba3738
|
[RLlib] Fix typo in vector_env docstring (#22534)
|
2022-02-22 08:13:50 +01:00 |
|
Daniel
|
308ccfe25c
|
[RLlib] DD-PPO move train_batch_size==-1 check to __init__ (#22521)
|
2022-02-21 11:44:12 +01:00 |
|
Sven Mika
|
c58cd90619
|
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465)
|
2022-02-17 22:32:26 +01:00 |
|