hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Jun Gong	d12977c4fb	[RLlib] TF2 Bandit Agent (#22838 )	2022-03-21 16:55:55 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
simonsays1980	568cf28dd4	[RLlib] Example script `custom_metrics_and_callbacks.py` should work for `batch_mode=complete_episodes`. (#22684 )	2022-03-01 09:00:38 +01:00
Sven Mika	7b687e6cd8	[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544 )	2022-02-25 21:58:16 +01:00
Jun Gong	a385c9b127	[RLlib] Update bandit_envs_recommender_system (#22421 )	2022-02-24 22:43:41 +01:00
Sven Mika	6522935291	[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389 )	2022-02-22 09:36:44 +01:00
Sven Mika	c58cd90619	[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465 )	2022-02-17 22:32:26 +01:00
Avnish Narayan	740def0a13	[RLlib] Put env-checker on critical path. (#22191 )	2022-02-17 14:06:14 +01:00
Sven Mika	04a5c72ea3	Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708 )	2022-02-10 13:44:22 +01:00
Sven Mika	44d09c2aa5	[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246 )	2022-02-10 02:58:43 +01:00
Jun Gong	3207f537cc	[RLlib] RecSim Interest evolution environment should use custom video sampler: `IEvVideoSampler` due to only one cluster being used. (#22211 )	2022-02-09 10:29:35 +01:00
Sven Mika	c17a44cdfa	Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153 )	2022-02-08 16:43:00 +01:00
Sven Mika	8b678ddd68	[RLlib] Issue 22036: Client should handle concurrent episodes with one being `training_enabled=False`. (#22076 )	2022-02-06 12:35:03 +01:00
Sven Mika	38d75ce058	[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827 )	2022-02-04 17:01:12 +01:00
Avnish Narayan	0d2ba41e41	[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685 )	2022-02-04 14:59:56 +01:00
SangBin Cho	a887763b38	Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105 ) This reverts commit `3f03ef8ba8`.	2022-02-04 00:54:50 -08:00
Sven Mika	3f03ef8ba8	[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356 )	2022-02-03 09:32:09 +01:00
Jun Gong	9c95b9a5fa	[RLlib] Add an env wrapper so RecSim works with our Bandits agent. (#22028 )	2022-02-02 12:15:38 +01:00
Jun Gong	a55258eb9c	[RLlib] Move bandit example scripts into examples folder. (#21949 )	2022-02-02 09:20:47 +01:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Sven Mika	893536ebd9	[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773 )	2022-01-27 13:58:12 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
Sven Mika	c288b97e5f	[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. (#21670 )	2022-01-24 19:38:21 +01:00
Avnish Narayan	12b087acb8	[RLlib] Base env pre-checker. (#21569 )	2022-01-18 16:34:06 +01:00
Jun Gong	7517aefe05	[RLlib] Bring back BC and Marwil learning tests. (#21574 )	2022-01-14 14:35:32 +01:00
Sven Mika	f94bd99ce4	[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448 )	2022-01-11 19:50:03 +01:00
Avnish Narayan	f7a5fc36eb	[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407 ) Increase test_preprocessors runtimes.	2022-01-06 11:54:19 -08:00
Sven Mika	c01245763e	[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339 )	2022-01-04 18:30:26 +01:00
Sven Mika	abd3bef63b	[RLlib] QMIX better defaults + added to CI learning tests (#21332 )	2022-01-04 08:54:41 +01:00
Kai Fricke	489e6945a6	Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )" (#21338 ) This reverts commit `327eb84154`.	2022-01-03 10:21:25 +00:00
Benjamin Black	327eb84154	[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )	2022-01-02 21:29:09 +01:00
Sven Mika	62dbf26394	[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984 )	2021-12-21 08:39:05 +01:00
Sven Mika	daa4304a91	[RLlib] Switch off preprocessors by default for PGTrainer. (#21008 )	2021-12-13 12:04:23 +01:00
Sven Mika	db058d0fb3	[RLlib] Rename `metrics_smoothing_episodes` into `metrics_num_episodes_for_smoothing` for clarity. (#20983 )	2021-12-11 20:33:35 +01:00
Sven Mika	596c8e2772	[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918 )	2021-12-11 14:57:58 +01:00
kk-55	9acf2f954d	[RLlib] Example containing a proposal for computing an adapted (time-dependent) GAE used by the PPO algorithm (via callback on_postprocess_trajectory) (#20850 )	2021-12-09 14:48:56 +01:00
Ishant Mrinal	2868d1a2cf	[RLlib] Support for RE3 exploration algorithm (for tf) (#19551 )	2021-12-07 13:26:34 +01:00
Sven Mika	60b2219d72	[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. (#20757 )	2021-12-04 13:26:33 +01:00
Jun Gong	2317c693cf	[RLlib] Use SampleBrach instead of input dict whenever possible (#20746 )	2021-12-02 13:11:26 +01:00
Jun Gong	65bd8e29f8	[RLlib] Update a few things to get rid of the `remote_vector_env` deprecation warning. (#20753 )	2021-12-02 13:10:44 +01:00
Sven Mika	49cd7ea6f9	[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). (#20571 )	2021-11-23 23:01:05 +01:00
Artur Niederfahrenhorst	d07e50e957	[RLlib] Replay buffer API (cleanups; docstrings; renames; move into `rllib/execution/buffers` dir) (#20552 )	2021-11-19 11:57:37 +01:00
Sven Mika	7a585fb275	[RLlib; Documentation] RLlib README overhaul. (#20249 )	2021-11-18 18:08:40 +01:00
Sven Mika	56619b955e	[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250 )	2021-11-17 21:40:16 +01:00
Sven Mika	f82880eda1	Revert "Revert [RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 ) (#20399 )" (#20417 ) This reverts commit `90dc5460d4`.	2021-11-16 14:49:41 +01:00
Stefan Schneider	2b3d0c691f	[RLlib] Document and extend action mask example. (#20390 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-16 13:20:41 +01:00
Kai Fricke	3e6ba5d6d2	Revert "Revert [RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`." (#20285 ) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit `246787cdd9`. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-16 12:26:47 +01:00
Amog Kamsetty	90dc5460d4	Revert "[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 )" (#20399 ) This reverts commit `5b1c8e46e1`.	2021-11-15 16:11:35 -08:00

1 2 3 4 5 ...

374 commits