Commit graph

1069 commits

Author SHA1 Message Date
Avnish Narayan
5134e0dc12
[RLlib] Change type to tensortype for cql policies. (#23438) 2022-03-24 12:32:29 +01:00
Fabian Witter
2547055f38
[RLlib] Add support for complex observations in CQL (#23332) 2022-03-22 17:04:07 +01:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent (#22838) 2022-03-21 16:55:55 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Fabien Couthouis
e575ed3350
[RLlib] Fix AttributeError with None obs shape + tf in _unpack_obs() utility (#22428) 2022-03-15 16:34:31 +01:00
Jeroen Bédorf
bc21a4593d
[RLlib] Fix crash when kl_coeff is set to 0 (#23063)
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
2022-03-11 12:24:52 -08:00
simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes (#22152) 2022-03-09 16:51:42 +01:00
Artur Niederfahrenhorst
37d129a965
[RLlib] ReplayBuffer API: Test cases. (#22390) 2022-03-08 16:54:12 +01:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657) 2022-03-08 14:25:48 +01:00
Jiajun Yao
4801e57c77
[Test] Add missing tests to bazel BUILD (#22827) 2022-03-07 19:54:49 -08:00
Sven Mika
3fe6f3b3eb
[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821) 2022-03-04 19:16:30 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. (#22738) 2022-03-04 17:49:51 +01:00
Kai Fricke
84a163a2c4
[RLlib] Remove atari rom install script (#22797) 2022-03-03 16:55:56 +01:00
Sven Mika
0af100ffae
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]). (#22731) 2022-03-01 21:24:24 +01:00
Sven Mika
e50bd212a1
[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686) 2022-03-01 16:44:17 +01:00
Daniel
8d1f1b0a64
[RLlib] Update pettingzoo==1.15.0 supersuit==3.3.3 (#22519) 2022-03-01 11:23:27 +01:00
simonsays1980
568cf28dd4
[RLlib] Example script custom_metrics_and_callbacks.py should work for batch_mode=complete_episodes. (#22684) 2022-03-01 09:00:38 +01:00
Jun Gong
e8be45065e
[RLlib] Restore policies on eval_workers as well. (#22641) 2022-03-01 08:38:14 +01:00
Jun Gong
22bc451102
[RLlib] Fix a memeory leak in SimpleReplyBuffer that completely kills sampling throughput (#22678) 2022-02-28 09:28:04 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544) 2022-02-25 21:58:16 +01:00
Jun Gong
a385c9b127
[RLlib] Update bandit_envs_recommender_system (#22421) 2022-02-24 22:43:41 +01:00
Sven Mika
526fd6b5fb
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590) 2022-02-24 22:05:36 +01:00
Sven Mika
18c269c70e
[RLlib] Issue 22539: agent_key not deleted from 2 dicts in simple list collector. (#22587) 2022-02-24 11:58:34 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543) 2022-02-23 13:03:45 +01:00
Xuehai Pan
018ebbf4cb
[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy while using multi-GPU optimizers (#21697) 2022-02-23 08:30:38 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
Jun Gong
2b6a0c71d7
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init. (#22493) 2022-02-22 08:18:32 +01:00
Steven Morad
d4571741aa
[RLlib] seq_lens should always be torch tensors. (#22398) 2022-02-22 08:15:43 +01:00
JYX
49d7ba3738
[RLlib] Fix typo in vector_env docstring (#22534) 2022-02-22 08:13:50 +01:00
Daniel
308ccfe25c
[RLlib] DD-PPO move train_batch_size==-1 check to __init__ (#22521) 2022-02-21 11:44:12 +01:00
Sven Mika
c58cd90619
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465) 2022-02-17 22:32:26 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. (#22191) 2022-02-17 14:06:14 +01:00
Sven Mika
5ca6a56e16
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. (#22350) 2022-02-15 09:32:43 +01:00
Jun Gong
6f5afcbce9
[RLlib] Docs enhancements: Setup-dev instructions; Ray datasets integration. (#22239) 2022-02-15 09:09:24 +01:00
Steven Morad
5d52b599aa
[RLlib] Fix zero gradients for ppo-clipped vf (#22171) 2022-02-15 08:57:18 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708) 2022-02-10 13:44:22 +01:00
Balaji Veeramani
abad268549
Comment fmt: off annotations (#21984)
Code formatting is disabled in several modules with the explanation
> [The module] ignores yapf because yapf doesn't allow comments right after code blocks,
but we put comments right after code blocks to prevent large white spaces
in the documentation.

Since we no longer use YAPF, it may be possible to re-enable code formatting on 
these modules. I've added "FIXME" comments requesting developers to check
whether code formatter appeasements are still necessary.
2022-02-09 22:12:11 -08:00
Sven Mika
1c791b71d8
[RLlib] Fix Unity3D built-in examples action bounds from -inf/inf to -1.0/1.0. (#22247) 2022-02-10 03:00:30 +01:00
Sven Mika
44d09c2aa5
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246) 2022-02-10 02:58:43 +01:00
Sven Mika
637cacedc9
[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. (#22245) 2022-02-10 02:58:09 +01:00
xwjiang2010
fc88b0895e
[tune] fix //rllib:tests/test_placement_groups (#22256) 2022-02-09 14:42:31 -08:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126

Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst
dea3574050
[RLlib] Replay Buffer API (#22114) 2022-02-09 15:04:43 +01:00
Jun Gong
3207f537cc
[RLlib] RecSim Interest evolution environment should use custom video sampler: IEvVideoSampler due to only one cluster being used. (#22211) 2022-02-09 10:29:35 +01:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078) 2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. (#22126) 2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153) 2022-02-08 16:43:00 +01:00
Sven Mika
8b678ddd68
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False. (#22076) 2022-02-06 12:35:03 +01:00