hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	eb54236d13	[RLlib] DD-PPO training iteration fn (#23906 )	2022-04-19 17:55:26 +02:00
Jun Gong	d3c69ebdb6	[RLlib] Make sure unsquash_action moves user action to proper range (#23941 )	2022-04-18 18:55:57 +02:00
Artur Niederfahrenhorst	e57ce7efd6	[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420 )	2022-04-18 12:20:12 +02:00
Sven Mika	92781c603e	[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) (#23735 )	2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi	c38a29573f	[RLlib] Removed deprecated code with error=True (#23916 )	2022-04-15 13:51:12 +02:00
Kai Fricke	65d9a410f7	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 ) Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories. Details: - Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc. - Minor adjustments to some scripts (variable renames) - Removes the outdated (unused) asan tests	2022-04-13 18:11:30 +01:00
Kinal Mehta	758e758c32	[rllib] Fix incorrect sequence length for rnn (#23830 ) Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.	2022-04-12 21:07:18 +01:00
Sven Mika	a8494742a3	[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412 )	2022-04-12 07:50:09 +02:00
Jun Gong	500cf7dcef	[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787 )	2022-04-11 22:07:07 +02:00
Jun Gong	c61910487f	[RLlib] Fix typo in docstring of PGTorchPolicy (#23818 )	2022-04-11 19:31:45 +02:00
Sven Mika	a3d4fc74a6	[RLlib] MARWIL: Move to training_iteration API. (#23798 )	2022-04-11 19:28:32 +02:00
Steven Morad	00922817b6	[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673 )	2022-04-11 08:39:10 +02:00
Eric Liang	1ff874e8e8	[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817 )	2022-04-10 16:12:53 -07:00
Kai Fricke	8c2e471265	[AIR] Add RLTrainer interface, implementation, and examples (#23465 ) This PR adds a RLTrainer to Ray AIR. It works for both offline and online use cases. In offline training, it will leverage the datasets key of the Trainer API to specify a dataset reader input, used e.g. in Behavioral Cloning (BC). In online training, it is a wrapper around the rllib trainables making use of the parameter layering enabled by the Trainer API.	2022-04-08 17:16:42 -07:00
Sven Mika	c82f6c62c8	[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739 )	2022-04-08 15:33:28 +02:00
Sven Mika	4d285a00a4	[RLlib] Issue 23689: tf Initializer has hard-coded float32 dtype. (#23741 )	2022-04-07 21:35:02 +02:00
Artur Niederfahrenhorst	02a50f02b7	[RLlib] RepayBuffer: `_hit_counts` working again. (#23586 )	2022-04-07 10:56:25 +02:00
Sven Mika	0b3a79ca41	[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740 )	2022-04-07 10:16:22 +02:00
Sven Mika	e391b624f0	[RLlib] Re-enable (for CI-testing) our two self_play example scripts. (#23742 )	2022-04-07 08:20:48 +02:00
Sven Mika	434265edd0	[RLlib] Examples folder: All `training_iteration` translations. (#23712 )	2022-04-05 16:33:50 +02:00
Steven Morad	39841b65b3	[RLlib] PPOTorchPolicy: Remove extra call to `model.value_function` (#23671 )	2022-04-05 08:40:29 +02:00
mesjou	e725472b5b	[RLlib] Fix bug in prisoners dillemma example. (#23690 )	2022-04-05 08:36:20 +02:00
Jiajun Yao	5f37231842	Remove yapf dependency (#23656 ) Yapf has been replaced by black.	2022-04-04 21:50:04 -07:00
Sven Mika	0bb82f29b6	[RLlib] AlphaStar polishing (fix logger.info bug). (#22281 )	2022-04-01 09:49:41 +02:00
Sven Mika	2eaa54bd76	[RLlib] POC: Config objects instead of dicts (PPO only). (#23491 )	2022-03-31 18:26:12 +02:00
simonsays1980	9ca9c67bc9	[RLlib] Added dtype safeguards to the 'required_model_output_shape()' methods… (#23490 )	2022-03-31 13:52:00 +02:00
simonsays1980	e4c6e9c3d3	[RLlib] Changed the if-block in the example callback to become more readable. (#22900 )	2022-03-31 09:13:04 +02:00
simonsays1980	d2a3948845	[RLlib] Removed the `sampler()` function in the ParallelRollouts() as it is no needed. (#22320 )	2022-03-31 09:06:30 +02:00
Artur Niederfahrenhorst	9a64bd4e9b	[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842 )	2022-03-29 14:44:40 +02:00
Jun Gong	a7e5aa8c6a	[RLlib] Delete some unused confusing logics. (#23513 )	2022-03-29 13:45:13 +02:00
Artur Niederfahrenhorst	32ad6c6ef1	[RLlib] Replay Buffer capacity check (#23523 )	2022-03-29 12:06:27 +02:00
Kai Fricke	262d6121bb	[rllib] Fix error messages and example for dataset writer (#23419 ) Currently the error message and example refer to a field type that is actually format.	2022-03-28 19:53:12 +01:00
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Max Pumperla	60054995e6	[docs] fix doctests and activate CI (#23418 )	2022-03-24 17:04:02 -07:00
Sven Mika	22c9c4aa39	[RLlib] Slate-Q +GPU torch bug fix. (#23464 )	2022-03-24 17:39:33 +01:00
Avnish Narayan	5134e0dc12	[RLlib] Change type to tensortype for cql policies. (#23438 )	2022-03-24 12:32:29 +01:00
Fabian Witter	2547055f38	[RLlib] Add support for complex observations in CQL (#23332 )	2022-03-22 17:04:07 +01:00
Jun Gong	d12977c4fb	[RLlib] TF2 Bandit Agent (#22838 )	2022-03-21 16:55:55 +01:00
Sven Mika	b1cda46681	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
Fabien Couthouis	e575ed3350	[RLlib] Fix AttributeError with None obs shape + tf in `_unpack_obs()` utility (#22428 )	2022-03-15 16:34:31 +01:00
Jeroen Bédorf	bc21a4593d	[RLlib] Fix crash when kl_coeff is set to 0 (#23063 ) Co-authored-by: Jeroen Bédorf <jeroen@minds.ai> Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com> Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>	2022-03-11 12:24:52 -08:00
simonsays1980	8627f44d7f	[RLlib] Remove duplicate code block: Config deprecation check for `metrics_smoothing_episodes` (#22152 )	2022-03-09 16:51:42 +01:00
Artur Niederfahrenhorst	37d129a965	[RLlib] ReplayBuffer API: Test cases. (#22390 )	2022-03-08 16:54:12 +01:00
Artur Niederfahrenhorst	c0ade5f0b7	[RLlib] Issue 22625: `MultiAgentBatch.timeslices()` does not behave as expected. (#22657 )	2022-03-08 14:25:48 +01:00
Jiajun Yao	4801e57c77	[Test] Add missing tests to bazel BUILD (#22827 )	2022-03-07 19:54:49 -08:00
Sven Mika	3fe6f3b3eb	[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821 )	2022-03-04 19:16:30 +01:00
Jun Gong	e765915ded	[RLlib] Make sure SlateQ works with GPU. (#22738 )	2022-03-04 17:49:51 +01:00
Kai Fricke	84a163a2c4	[RLlib] Remove atari rom install script (#22797 )	2022-03-03 16:55:56 +01:00
Sven Mika	0af100ffae	[RLlib] Fix tree.flatten dict ordering bug: `flatten_space([obs_space])` should produce same struct as `tree.flatten([obs])`. (#22731 )	2022-03-01 21:24:24 +01:00

1 2 3 4 5 ...

1154 commits