hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 13:26:39 -04:00

Author	SHA1	Message	Date
Sven Mika	04a5c72ea3	Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708 )	2022-02-10 13:44:22 +01:00
Balaji Veeramani	abad268549	Comment `fmt: off` annotations (#21984 ) Code formatting is disabled in several modules with the explanation > [The module] ignores yapf because yapf doesn't allow comments right after code blocks, but we put comments right after code blocks to prevent large white spaces in the documentation. Since we no longer use YAPF, it may be possible to re-enable code formatting on these modules. I've added "FIXME" comments requesting developers to check whether code formatter appeasements are still necessary.	2022-02-09 22:12:11 -08:00
Sven Mika	1c791b71d8	[RLlib] Fix Unity3D built-in examples action bounds from -inf/inf to -1.0/1.0. (#22247 )	2022-02-10 03:00:30 +01:00
Sven Mika	44d09c2aa5	[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246 )	2022-02-10 02:58:43 +01:00
Sven Mika	637cacedc9	[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. (#22245 )	2022-02-10 02:58:09 +01:00
xwjiang2010	fc88b0895e	[tune] fix //rllib:tests/test_placement_groups (#22256 )	2022-02-09 14:42:31 -08:00
Alex Wu	b122f093c1	Revert "[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test." (#22250 ) Reverts ray-project/ray#22126 Breaks rllib:tests/test_io	2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst	dea3574050	[RLlib] Replay Buffer API (#22114 )	2022-02-09 15:04:43 +01:00
Jun Gong	3207f537cc	[RLlib] RecSim Interest evolution environment should use custom video sampler: `IEvVideoSampler` due to only one cluster being used. (#22211 )	2022-02-09 10:29:35 +01:00
Ishant Mrinal	f0d8b6d701	[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078 )	2022-02-09 09:05:26 +01:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Sven Mika	ac3e6ab411	[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. (#22126 )	2022-02-08 19:04:13 +01:00
Sven Mika	c17a44cdfa	Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153 )	2022-02-08 16:43:00 +01:00
Sven Mika	8b678ddd68	[RLlib] Issue 22036: Client should handle concurrent episodes with one being `training_enabled=False`. (#22076 )	2022-02-06 12:35:03 +01:00
Sven Mika	f6617506a2	[RLlib] Add `on_sub_environment_created` to DefaultCallbacks class. (#21893 )	2022-02-04 22:22:47 +01:00
Sven Mika	38d75ce058	[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827 )	2022-02-04 17:01:12 +01:00
Avnish Narayan	0d2ba41e41	[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685 )	2022-02-04 14:59:56 +01:00
SangBin Cho	a887763b38	Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105 ) This reverts commit `3f03ef8ba8`.	2022-02-04 00:54:50 -08:00
Sven Mika	3f03ef8ba8	[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356 )	2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano	a258f9c692	[RLlib] Neural-MMO `keep_per_episode_custom_metrics` patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042 )	2022-02-02 17:28:42 +01:00
Jun Gong	9c95b9a5fa	[RLlib] Add an env wrapper so RecSim works with our Bandits agent. (#22028 )	2022-02-02 12:15:38 +01:00
Jun Gong	87fe033f7b	[RLlib] Request CPU resources in `Trainer.default_resource_request()` if using dataset input. (#21948 )	2022-02-02 10:20:37 +01:00
Jun Gong	a55258eb9c	[RLlib] Move bandit example scripts into examples folder. (#21949 )	2022-02-02 09:20:47 +01:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Sven Mika	7fc1683bab	[RLlib] Some more `bandit` cleanup/tests. (#21932 )	2022-01-28 12:03:26 +01:00
Sven Mika	ee41800c16	[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . (#21649 )	2022-01-27 22:07:05 +01:00
Jun Gong	8ebc50f844	[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855 )	2022-01-27 20:08:58 +01:00
Sven Mika	893536ebd9	[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773 )	2022-01-27 13:58:12 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Jun Gong	099c170ab4	[RLlib] Dataset Reader/Writer for RLlib (#21808 )	2022-01-26 16:00:46 +01:00
Jun Gong	55f3bcfb2d	[RLlib] Add a logstd term to MARWIL's loss func to encourage exploration. (#21493 )	2022-01-26 16:00:17 +01:00
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
Sven Mika	c288b97e5f	[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. (#21670 )	2022-01-24 19:38:21 +01:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Sven Mika	c4636c7c05	[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. (#21665 )	2022-01-20 11:46:25 +01:00
Avnish Narayan	12b087acb8	[RLlib] Base env pre-checker. (#21569 )	2022-01-18 16:34:06 +01:00
mickelliu	75078f965d	[Rllib] Fix `range()` (no keyword args supported!) in torch version of `attention_net.py`. (#21598 )	2022-01-18 16:11:16 +01:00
Vince Jankovics	7dc3de4eed	[RLlib] Fix config mismatch for train_one_step. num_sgd_iter instead of sgd_num_iter. (#21555 )	2022-01-18 16:00:27 +01:00
Jun Gong	7517aefe05	[RLlib] Bring back BC and Marwil learning tests. (#21574 )	2022-01-14 14:35:32 +01:00
Sven Mika	3ac4daba07	[RLlib] Discussion 4351: Conv2d default filter tests and add default setting for 96x96 image obs space. (#21560 )	2022-01-13 18:50:42 +01:00
Avnish Narayan	c0f1202278	[RLlib] `MultiAgentEnv` pre-checker (#21476 )	2022-01-13 11:31:22 +01:00
Sven Mika	90c6b10498	[RLlib] Decentralized multi-agent learning; PR #01 (#21421 )	2022-01-13 10:52:55 +01:00
Sven Mika	188324c5c7	[RLlib] Issue 21552: `unsquash_action` and `clip_action` (when None) cause wrong actions computed by `Trainer.compute_single_action`. (#21553 )	2022-01-12 18:56:51 +01:00
Matti Picus	ec6a33b736	[tune] fixes to allow tune/tests/test_commands.py to run on windows (#21342 ) tune does not run smoothly on Windows. This cleans up some blockers: - use cross-platform shutils.get_terminal_size instead of Popen(stty) - somehow Trainer.workers is None at the end of test_commands.py, so the cleanup command was erroring. The error was not fatal, but was printing in the logs. - if run locally, the log files are all written to the same location, so the rync-based syncing solution is not needed. This is the real fix for issue #20747	2022-01-11 15:57:20 -08:00
Sven Mika	f94bd99ce4	[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448 )	2022-01-11 19:50:03 +01:00
Sven Mika	92f030331e	[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420 )	2022-01-10 11:22:55 +01:00
Sven Mika	4eaf70942d	[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456 )	2022-01-10 11:22:40 +01:00
Sven Mika	35af30a446	[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110 )	2022-01-10 11:20:37 +01:00
Sven Mika	b10d5533be	[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452 )	2022-01-10 11:19:40 +01:00

... 2 3 4 5 6 ...

1183 commits