Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Balaji Veeramani
abad268549
Comment fmt: off
annotations ( #21984 )
...
Code formatting is disabled in several modules with the explanation
> [The module] ignores yapf because yapf doesn't allow comments right after code blocks,
but we put comments right after code blocks to prevent large white spaces
in the documentation.
Since we no longer use YAPF, it may be possible to re-enable code formatting on
these modules. I've added "FIXME" comments requesting developers to check
whether code formatter appeasements are still necessary.
2022-02-09 22:12:11 -08:00
Sven Mika
1c791b71d8
[RLlib] Fix Unity3D built-in examples action bounds from -inf/inf to -1.0/1.0. ( #22247 )
2022-02-10 03:00:30 +01:00
Sven Mika
44d09c2aa5
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). ( #22246 )
2022-02-10 02:58:43 +01:00
Sven Mika
637cacedc9
[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. ( #22245 )
2022-02-10 02:58:09 +01:00
xwjiang2010
fc88b0895e
[tune] fix //rllib:tests/test_placement_groups ( #22256 )
2022-02-09 14:42:31 -08:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst
dea3574050
[RLlib] Replay Buffer API ( #22114 )
2022-02-09 15:04:43 +01:00
Jun Gong
3207f537cc
[RLlib] RecSim Interest evolution environment should use custom video sampler: IEvVideoSampler
due to only one cluster being used. ( #22211 )
2022-02-09 10:29:35 +01:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. ( #22078 )
2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
Sven Mika
8b678ddd68
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False
. ( #22076 )
2022-02-06 12:35:03 +01:00
Sven Mika
f6617506a2
[RLlib] Add on_sub_environment_created
to DefaultCallbacks class. ( #21893 )
2022-02-04 22:22:47 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net ( #21827 )
2022-02-04 17:01:12 +01:00
Avnish Narayan
0d2ba41e41
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments ( #21685 )
2022-02-04 14:59:56 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano
a258f9c692
[RLlib] Neural-MMO keep_per_episode_custom_metrics
patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). ( #22042 )
2022-02-02 17:28:42 +01:00
Jun Gong
9c95b9a5fa
[RLlib] Add an env wrapper so RecSim works with our Bandits agent. ( #22028 )
2022-02-02 12:15:38 +01:00
Jun Gong
87fe033f7b
[RLlib] Request CPU resources in Trainer.default_resource_request()
if using dataset input. ( #21948 )
2022-02-02 10:20:37 +01:00
Jun Gong
a55258eb9c
[RLlib] Move bandit example scripts into examples folder. ( #21949 )
2022-02-02 09:20:47 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
7fc1683bab
[RLlib] Some more bandit
cleanup/tests. ( #21932 )
2022-01-28 12:03:26 +01:00
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . ( #21649 )
2022-01-27 22:07:05 +01:00
Jun Gong
8ebc50f844
[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. ( #21855 )
2022-01-27 20:08:58 +01:00
Sven Mika
893536ebd9
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; ( #21773 )
2022-01-27 13:58:12 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Jun Gong
099c170ab4
[RLlib] Dataset Reader/Writer for RLlib ( #21808 )
2022-01-26 16:00:46 +01:00
Jun Gong
55f3bcfb2d
[RLlib] Add a logstd term to MARWIL's loss func to encourage exploration. ( #21493 )
2022-01-26 16:00:17 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
c288b97e5f
[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. ( #21670 )
2022-01-24 19:38:21 +01:00
xwjiang2010
9af8f11191
Revert "[docs] Clean up doc structure (first part) ( #21667 )" ( #21763 )
...
This reverts commit 38e46c9fb3
.
2022-01-20 15:30:56 -08:00
Max Pumperla
38e46c9fb3
[docs] Clean up doc structure (first part) ( #21667 )
2022-01-20 16:19:04 +01:00
Sven Mika
c4636c7c05
[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. ( #21665 )
2022-01-20 11:46:25 +01:00
Avnish Narayan
12b087acb8
[RLlib] Base env pre-checker. ( #21569 )
2022-01-18 16:34:06 +01:00
mickelliu
75078f965d
[Rllib] Fix range()
(no keyword args supported!) in torch version of attention_net.py
. ( #21598 )
2022-01-18 16:11:16 +01:00
Vince Jankovics
7dc3de4eed
[RLlib] Fix config mismatch for train_one_step. num_sgd_iter instead of sgd_num_iter. ( #21555 )
2022-01-18 16:00:27 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. ( #21574 )
2022-01-14 14:35:32 +01:00
Sven Mika
3ac4daba07
[RLlib] Discussion 4351: Conv2d default filter tests and add default setting for 96x96 image obs space. ( #21560 )
2022-01-13 18:50:42 +01:00
Avnish Narayan
c0f1202278
[RLlib] MultiAgentEnv
pre-checker ( #21476 )
2022-01-13 11:31:22 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 ( #21421 )
2022-01-13 10:52:55 +01:00
Sven Mika
188324c5c7
[RLlib] Issue 21552: unsquash_action
and clip_action
(when None) cause wrong actions computed by Trainer.compute_single_action
. ( #21553 )
2022-01-12 18:56:51 +01:00
Matti Picus
ec6a33b736
[tune] fixes to allow tune/tests/test_commands.py to run on windows ( #21342 )
...
tune does not run smoothly on Windows. This cleans up some blockers:
- use cross-platform shutils.get_terminal_size instead of Popen(stty)
- somehow Trainer.workers is None at the end of test_commands.py, so the cleanup command was erroring. The error was not fatal, but was printing in the logs.
- if run locally, the log files are all written to the same location, so the rync-based syncing solution is not needed. This is the real fix for issue #20747
2022-01-11 15:57:20 -08:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. ( #21448 )
2022-01-11 19:50:03 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. ( #21420 )
2022-01-10 11:22:55 +01:00
Sven Mika
4eaf70942d
[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 ( #21456 )
2022-01-10 11:22:40 +01:00
Sven Mika
35af30a446
[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. ( #21110 )
2022-01-10 11:20:37 +01:00
Sven Mika
b10d5533be
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. ( #21452 )
2022-01-10 11:19:40 +01:00