hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Jun Gong	d5a6d46049	[RLlib] Migrate MAML, MB-MPO, MARWIL, and BC to use Policy sub-classing implementation. (#24914 )	2022-05-20 14:10:59 +02:00
kourosh hakhamaneshi	3815e52a61	[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896 )	2022-05-19 18:30:42 +02:00
Jun Gong	dea134a472	[RLlib] Clean up Policy mixins. (#24746 )	2022-05-17 17:16:08 +02:00
Sven Mika	25001f6d8d	[RLlib] APPO Training iteration fn. (#24545 )	2022-05-17 10:31:07 +02:00
Jun Gong	bc3a1d35cf	[RLlib] Introduce new policy base classes. (#24742 )	2022-05-13 21:48:30 +02:00
Artur Niederfahrenhorst	bd2fdf4752	[RLlib] Automate sequences in `timeslice_along_seq_lens_with_overlap()`. (#24561 )	2022-05-09 11:55:06 +02:00
Daewoo Lee	fee35444ab	[RLlib] Issue 24530: Fix `add_time_dimension` (#24531 ) Co-authored-by: Daewoo Lee <dwlee@rtst.co.kr>	2022-05-06 15:21:42 +02:00
Edward Oakes	11954e6798	Issue 24143: Fix a few f-strings missing the f. (#24232 )	2022-05-02 16:11:33 +02:00
Xuehai Pan	377a522ce2	[RLlib] Fix time dimension shaping for PyTorch RNN models. (#21735 )	2022-04-29 10:39:03 +02:00
Ishant Mrinal	0248c60387	[RLlib] Add additional return values to `action_sampler_fn`. (#22721 )	2022-04-29 10:34:48 +02:00
Sven Mika	6551922c21	[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy `as_eager()` or `with_tracing` more than once. (#24271 )	2022-04-28 13:43:21 +02:00
Xuehai Pan	6087eda91b	[RLlib] Issue 21991: Fix `SampleBatch` slicing for `SampleBatch.INFOS` in RNN cases (#22050 )	2022-04-25 11:40:24 +02:00
Noon van der Silk	3589c21924	[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148 )	2022-04-25 11:25:28 +02:00
Jeroen Bédorf	1263015931	[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041 )	2022-04-25 11:17:50 +02:00
Sven Mika	9de391b70e	[RLlib] Issue 23897: `add_time_dimension()` causes returned shape to be completely unknown. (#24006 )	2022-04-19 17:56:56 +02:00
Sven Mika	de9e143938	[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005 )	2022-04-19 17:55:59 +02:00
Kinal Mehta	758e758c32	[rllib] Fix incorrect sequence length for rnn (#23830 ) Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.	2022-04-12 21:07:18 +01:00
Sven Mika	a8494742a3	[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412 )	2022-04-12 07:50:09 +02:00
Sven Mika	0b3a79ca41	[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740 )	2022-04-07 10:16:22 +02:00
Max Pumperla	60054995e6	[docs] fix doctests and activate CI (#23418 )	2022-03-24 17:04:02 -07:00
Sven Mika	b1cda46681	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
Artur Niederfahrenhorst	c0ade5f0b7	[RLlib] Issue 22625: `MultiAgentBatch.timeslices()` does not behave as expected. (#22657 )	2022-03-08 14:25:48 +01:00
Xuehai Pan	018ebbf4cb	[RLlib] Issue #21671 : Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers (#21697 )	2022-02-23 08:30:38 +01:00
Steven Morad	d4571741aa	[RLlib] `seq_lens` should always be torch tensors. (#22398 )	2022-02-22 08:15:43 +01:00
Sven Mika	04a5c72ea3	Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708 )	2022-02-10 13:44:22 +01:00
Alex Wu	b122f093c1	Revert "[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test." (#22250 ) Reverts ray-project/ray#22126 Breaks rllib:tests/test_io	2022-02-09 09:26:36 -08:00
Sven Mika	ac3e6ab411	[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. (#22126 )	2022-02-08 19:04:13 +01:00
Sven Mika	c17a44cdfa	Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153 )	2022-02-08 16:43:00 +01:00
SangBin Cho	a887763b38	Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105 ) This reverts commit `3f03ef8ba8`.	2022-02-04 00:54:50 -08:00
Sven Mika	3f03ef8ba8	[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356 )	2022-02-03 09:32:09 +01:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Sven Mika	ee41800c16	[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . (#21649 )	2022-01-27 22:07:05 +01:00
Sven Mika	92f030331e	[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420 )	2022-01-10 11:22:55 +01:00
Sven Mika	3a3d0a4a2b	[RLlib] Issue 21340: SampleBatch __init__ docstring wrong. (#21447 )	2022-01-07 15:48:14 +01:00
Sven Mika	9e6b871739	[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330 )	2022-01-05 11:29:44 +01:00
Sven Mika	62dbf26394	[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984 )	2021-12-21 08:39:05 +01:00
brulu	8b77fc0aef	[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799 )	2021-12-16 08:27:55 +01:00
Sven Mika	daa4304a91	[RLlib] Switch off preprocessors by default for PGTrainer. (#21008 )	2021-12-13 12:04:23 +01:00
Sven Mika	596c8e2772	[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918 )	2021-12-11 14:57:58 +01:00
Sven Mika	f814c2af89	[RLlib; Docs] Docs API reference pages: `rllib/execution`, `rllib/evaluation`, `rllib/models`, `rllib/offline`. (#20538 )	2021-12-10 09:41:29 +01:00
Carlo Grisetti	a8286c55af	[RLLib] Fix deprecated convert_to_non_torch_type (#20751 )	2021-12-09 14:42:12 +01:00
Ishant Mrinal	2868d1a2cf	[RLlib] Support for RE3 exploration algorithm (for tf) (#19551 )	2021-12-07 13:26:34 +01:00
Jun Gong	2317c693cf	[RLlib] Use SampleBrach instead of input dict whenever possible (#20746 )	2021-12-02 13:11:26 +01:00
mvindiola1	8cee0c03bf	[RLlib] Update `max_seq_len` in pad_batch_to_sequences_of_same_size (#20743 )	2021-11-30 18:00:07 +01:00
mvindiola1	eadc7669c5	[RLlib] SampleBatch.concat_samples fix incorrect max_seq_len calculation (#20704 )	2021-11-29 12:01:40 +01:00
Sven Mika	e37afe0425	[RLlib; Docs] Auto API reference pages overhaul: `rllib/policy` and `rllib/agents` packages. (#20537 )	2021-11-25 09:35:19 +01:00
Sven Mika	f82880eda1	Revert "Revert [RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 ) (#20399 )" (#20417 ) This reverts commit `90dc5460d4`.	2021-11-16 14:49:41 +01:00
Kai Fricke	3e6ba5d6d2	Revert "Revert [RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`." (#20285 ) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit `246787cdd9`. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-16 12:26:47 +01:00
Amog Kamsetty	90dc5460d4	Revert "[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 )" (#20399 ) This reverts commit `5b1c8e46e1`.	2021-11-15 16:11:35 -08:00

1 2 3 4 5 ...

309 commits