hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Ishant Mrinal	57244aeee3	[RLlib] Make DQN update_target use only trainable variables. (#25226 )	2022-07-15 09:17:06 +02:00
Jun Gong	b383d987d1	[RLlib] Fix a bunch of issues related to connectors. (#26510 )	2022-07-13 18:55:20 +02:00
Jun Gong	0c469e490e	[RLlib] Checkpoint and restore connectors. (#26253 )	2022-07-09 01:06:24 -07:00
Steven Morad	0bc465f687	[RLlib] Fix docstring and add unit tests for rnn sequencing. (#26197 )	2022-07-06 14:32:57 +02:00
Jun Gong	d83bbda281	[RLlib] Save serialized PolicySpec. Extract `num_gpus` related logics into a util function. (#25954 )	2022-06-30 11:38:21 +02:00
Jun Gong	52bb8e47d4	[RLlib] EnvRunnerV2 and EpisodeV2 that support Connectors. (#25922 )	2022-06-30 08:44:10 +02:00
Charles Sun	70f94e6d63	[RLlib] Migrating DDPG to PolicyV2. (#26054 )	2022-06-28 15:52:56 +02:00
Sven Mika	3d6df50258	[RLlib] Fix `get_num_samples_loaded_into_buffer` in TorchPolicyV2. (#25956 )	2022-06-22 13:11:41 +02:00
Eric Liang	43aa2299e6	[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695 ) Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.	2022-06-21 15:13:29 -07:00
Avnish Narayan	d859b84058	[RLlib] Add compute log likelihoods test for CRR. (#25905 )	2022-06-21 16:06:10 +02:00
Artur Niederfahrenhorst	e10876604d	[RLlib] Include SampleBatch.T column in all collected batches. (#25926 )	2022-06-21 13:20:22 +02:00
Sven Mika	96693055bd	[RLlib] More Trainer -> Algorithm renaming cleanups. (#25869 )	2022-06-20 15:54:00 +02:00
Sven Mika	d90c6cfbd6	[RLlib] SimpleQ PolicyV2 (sub-classing). (#25871 )	2022-06-17 20:12:16 +02:00
Avnish Narayan	393cf4d8f7	[RLlib] Fix `action_sampler_fn` call in `TorchPolicyV2` (`obs_batch` instead of `input_dict` arg). (#25877 )	2022-06-17 08:39:39 +02:00
kourosh hakhamaneshi	f597e21ac8	[RLlib] Fix sample batch concat samples. (#25572 )	2022-06-14 12:47:29 +02:00
Sven Mika	130b7eeaba	[RLlib] `Trainer` to `Algorithm` renaming. (#25539 )	2022-06-11 15:10:39 +02:00
Artur Niederfahrenhorst	94d6c212df	[RLlib] Replay Buffer API documentation. (#24683 )	2022-06-10 16:47:51 +02:00
Artur Niederfahrenhorst	7495e9c89c	[RLlib] Dreamer Policy sub-classing schema. (#25585 )	2022-06-09 17:14:15 +02:00
Artur Niederfahrenhorst	5133978adc	[RLlib] PG policy subclassing conversion. (#25288 )	2022-06-06 13:07:47 +02:00
kourosh hakhamaneshi	d49d0efbaf	[RLlib] Bug fix: when on GPU, sample_batch.to_device() only converts the device and does not convert float64 to float32. (#25460 )	2022-06-06 12:43:11 +02:00
Sven Mika	b5bc2b93c3	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
Yi Cheng	fd0f967d2e	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)	2022-06-02 20:38:44 -07:00
Sven Mika	e4ceae19ef	[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )	2022-06-02 16:47:05 +02:00
Artur Niederfahrenhorst	71a8a443ce	[RLlib] Fix Policy global timesteps being off by init sample batch size. (#25349 )	2022-06-02 10:19:21 +02:00
Eric Liang	905258dbc1	Clean up docstyle in python modules and add LINT rule (#25272 )	2022-06-01 11:27:54 -07:00
Sven Mika	d95009a3ac	[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). (#24967 )	2022-05-28 10:50:03 +02:00
Sven Mika	ab6c3027e5	[RLlib] A2/3C policy sub-classing schema. (#25078 )	2022-05-28 09:54:47 +02:00
kourosh hakhamaneshi	9684ea3af6	[RLlib] Fix TorchPolicyV2 bug. (#25203 )	2022-05-26 20:49:26 +02:00
Jun Gong	eaf9c941ae	[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. (#25117 )	2022-05-25 14:38:03 +02:00
Eric Liang	4963dfaae0	[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060 )	2022-05-24 22:14:25 -07:00
Jun Gong	93ff0beb4e	[RLlib] Introduce utils to serialize gym Spaces (and thus ViewRequirements). (#25007 )	2022-05-24 21:12:20 +02:00
Steven Morad	501d932449	[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059 )	2022-05-22 19:58:47 +02:00
Jun Gong	d5a6d46049	[RLlib] Migrate MAML, MB-MPO, MARWIL, and BC to use Policy sub-classing implementation. (#24914 )	2022-05-20 14:10:59 +02:00
kourosh hakhamaneshi	3815e52a61	[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896 )	2022-05-19 18:30:42 +02:00
Jun Gong	dea134a472	[RLlib] Clean up Policy mixins. (#24746 )	2022-05-17 17:16:08 +02:00
Sven Mika	25001f6d8d	[RLlib] APPO Training iteration fn. (#24545 )	2022-05-17 10:31:07 +02:00
Jun Gong	bc3a1d35cf	[RLlib] Introduce new policy base classes. (#24742 )	2022-05-13 21:48:30 +02:00
Artur Niederfahrenhorst	bd2fdf4752	[RLlib] Automate sequences in `timeslice_along_seq_lens_with_overlap()`. (#24561 )	2022-05-09 11:55:06 +02:00
Daewoo Lee	fee35444ab	[RLlib] Issue 24530: Fix `add_time_dimension` (#24531 ) Co-authored-by: Daewoo Lee <dwlee@rtst.co.kr>	2022-05-06 15:21:42 +02:00
Edward Oakes	11954e6798	Issue 24143: Fix a few f-strings missing the f. (#24232 )	2022-05-02 16:11:33 +02:00
Xuehai Pan	377a522ce2	[RLlib] Fix time dimension shaping for PyTorch RNN models. (#21735 )	2022-04-29 10:39:03 +02:00
Ishant Mrinal	0248c60387	[RLlib] Add additional return values to `action_sampler_fn`. (#22721 )	2022-04-29 10:34:48 +02:00
Sven Mika	6551922c21	[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy `as_eager()` or `with_tracing` more than once. (#24271 )	2022-04-28 13:43:21 +02:00
Xuehai Pan	6087eda91b	[RLlib] Issue 21991: Fix `SampleBatch` slicing for `SampleBatch.INFOS` in RNN cases (#22050 )	2022-04-25 11:40:24 +02:00
Noon van der Silk	3589c21924	[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148 )	2022-04-25 11:25:28 +02:00
Jeroen Bédorf	1263015931	[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041 )	2022-04-25 11:17:50 +02:00
Sven Mika	9de391b70e	[RLlib] Issue 23897: `add_time_dimension()` causes returned shape to be completely unknown. (#24006 )	2022-04-19 17:56:56 +02:00
Sven Mika	de9e143938	[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005 )	2022-04-19 17:55:59 +02:00
Kinal Mehta	758e758c32	[rllib] Fix incorrect sequence length for rnn (#23830 ) Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.	2022-04-12 21:07:18 +01:00
Sven Mika	a8494742a3	[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412 )	2022-04-12 07:50:09 +02:00

1 2 3 4 5 ...

341 commits