hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Avnish Narayan	6e68b6bef9	[RLlib] DD-PPO training iteration fn. (#24118 ) We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096 Co-authored-by: sven1977 <svenmika1977@gmail.com>	2022-04-22 15:22:14 -07:00
xwjiang2010	d7da0d706e	[rllib] Only conditionally import JaxCategorical in catalog.py (#24086 ) * Experiment with less imports in catalog.py * lint	2022-04-22 14:51:35 -07:00
Avnish Narayan	3bf907bcf8	[RLlib] Don't modify environments via the env checker utilities. (#24083 )	2022-04-22 18:39:47 +02:00
Kai Fricke	9f7170e444	Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )" (#24103 ) This reverts commit `a337fd994e`.	2022-04-22 09:58:58 +01:00
jon-chuang	e6a458a31e	[CI] Create zip of ray `session_latest/logs` dir on test failure and upload to buildkite via `/artifact-mount` (#23783 ) Creates a zip of session_latest dir with test name and timestamp upon python test failure. Writes to dir specified by env var `RAY_TEST_FAILURE_LOGS_DIR`. Noop if env var does not exist. Downstream consumer (e.g. CI) can upload all created artifacts in this dir. Thereby, PR submitters can more easily debug their CI failures, especially if they can't repro locally. Limitations: - a conftest.py file importing the main ray conftest.py needs to be present in same dir as test. This presents a challenge for e.g. dashboard tests which are highly scattered	2022-04-22 09:48:53 +01:00
Grzegorz Rypeść	dfb9689701	[RLlib] Issue 21489: Unity3D env lacks group rewards (#24016 ).	2022-04-21 18:49:52 +02:00
Avnish Narayan	a337fd994e	Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )	2022-04-21 17:37:49 +02:00
Sven Mika	14dd7aac13	[RLlib] Issue 22943: PettingZoo parallel should not use env checking (for now). (#24025 )	2022-04-21 11:20:54 +02:00
Avnish Narayan	477b9d22d2	[RLlib][Training iteration fn] APEX conversion (#22937 )	2022-04-20 17:56:18 +02:00
Avnish Narayan	0ddbce6518	Revert "[RLlib] DD-PPO training iteration fn (#23906 )" (#24030 ) The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does. We'll need to fix the test then re-merge Reverts #23906	2022-04-19 16:43:57 -07:00
Avnish Narayan	55f6896142	[RLlib] Issue 24014: Change occurrences of randint to integers in RLlib (#24019 )	2022-04-19 22:15:14 +02:00
Sven Mika	9de391b70e	[RLlib] Issue 23897: `add_time_dimension()` causes returned shape to be completely unknown. (#24006 )	2022-04-19 17:56:56 +02:00
Sven Mika	de9e143938	[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005 )	2022-04-19 17:55:59 +02:00
Sven Mika	eb54236d13	[RLlib] DD-PPO training iteration fn (#23906 )	2022-04-19 17:55:26 +02:00
Jun Gong	d3c69ebdb6	[RLlib] Make sure unsquash_action moves user action to proper range (#23941 )	2022-04-18 18:55:57 +02:00
Artur Niederfahrenhorst	e57ce7efd6	[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420 )	2022-04-18 12:20:12 +02:00
Sven Mika	92781c603e	[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) (#23735 )	2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi	c38a29573f	[RLlib] Removed deprecated code with error=True (#23916 )	2022-04-15 13:51:12 +02:00
Kai Fricke	65d9a410f7	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 ) Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories. Details: - Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc. - Minor adjustments to some scripts (variable renames) - Removes the outdated (unused) asan tests	2022-04-13 18:11:30 +01:00
Kinal Mehta	758e758c32	[rllib] Fix incorrect sequence length for rnn (#23830 ) Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.	2022-04-12 21:07:18 +01:00
Sven Mika	a8494742a3	[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412 )	2022-04-12 07:50:09 +02:00
Jun Gong	500cf7dcef	[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787 )	2022-04-11 22:07:07 +02:00
Jun Gong	c61910487f	[RLlib] Fix typo in docstring of PGTorchPolicy (#23818 )	2022-04-11 19:31:45 +02:00
Sven Mika	a3d4fc74a6	[RLlib] MARWIL: Move to training_iteration API. (#23798 )	2022-04-11 19:28:32 +02:00
Steven Morad	00922817b6	[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673 )	2022-04-11 08:39:10 +02:00
Eric Liang	1ff874e8e8	[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817 )	2022-04-10 16:12:53 -07:00
Kai Fricke	8c2e471265	[AIR] Add RLTrainer interface, implementation, and examples (#23465 ) This PR adds a RLTrainer to Ray AIR. It works for both offline and online use cases. In offline training, it will leverage the datasets key of the Trainer API to specify a dataset reader input, used e.g. in Behavioral Cloning (BC). In online training, it is a wrapper around the rllib trainables making use of the parameter layering enabled by the Trainer API.	2022-04-08 17:16:42 -07:00
Sven Mika	c82f6c62c8	[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739 )	2022-04-08 15:33:28 +02:00
Sven Mika	4d285a00a4	[RLlib] Issue 23689: tf Initializer has hard-coded float32 dtype. (#23741 )	2022-04-07 21:35:02 +02:00
Artur Niederfahrenhorst	02a50f02b7	[RLlib] RepayBuffer: `_hit_counts` working again. (#23586 )	2022-04-07 10:56:25 +02:00
Sven Mika	0b3a79ca41	[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740 )	2022-04-07 10:16:22 +02:00
Sven Mika	e391b624f0	[RLlib] Re-enable (for CI-testing) our two self_play example scripts. (#23742 )	2022-04-07 08:20:48 +02:00
Sven Mika	434265edd0	[RLlib] Examples folder: All `training_iteration` translations. (#23712 )	2022-04-05 16:33:50 +02:00
Steven Morad	39841b65b3	[RLlib] PPOTorchPolicy: Remove extra call to `model.value_function` (#23671 )	2022-04-05 08:40:29 +02:00
mesjou	e725472b5b	[RLlib] Fix bug in prisoners dillemma example. (#23690 )	2022-04-05 08:36:20 +02:00
Jiajun Yao	5f37231842	Remove yapf dependency (#23656 ) Yapf has been replaced by black.	2022-04-04 21:50:04 -07:00
Sven Mika	0bb82f29b6	[RLlib] AlphaStar polishing (fix logger.info bug). (#22281 )	2022-04-01 09:49:41 +02:00
Sven Mika	2eaa54bd76	[RLlib] POC: Config objects instead of dicts (PPO only). (#23491 )	2022-03-31 18:26:12 +02:00
simonsays1980	9ca9c67bc9	[RLlib] Added dtype safeguards to the 'required_model_output_shape()' methods… (#23490 )	2022-03-31 13:52:00 +02:00
simonsays1980	e4c6e9c3d3	[RLlib] Changed the if-block in the example callback to become more readable. (#22900 )	2022-03-31 09:13:04 +02:00
simonsays1980	d2a3948845	[RLlib] Removed the `sampler()` function in the ParallelRollouts() as it is no needed. (#22320 )	2022-03-31 09:06:30 +02:00
Artur Niederfahrenhorst	9a64bd4e9b	[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842 )	2022-03-29 14:44:40 +02:00
Jun Gong	a7e5aa8c6a	[RLlib] Delete some unused confusing logics. (#23513 )	2022-03-29 13:45:13 +02:00
Artur Niederfahrenhorst	32ad6c6ef1	[RLlib] Replay Buffer capacity check (#23523 )	2022-03-29 12:06:27 +02:00
Kai Fricke	262d6121bb	[rllib] Fix error messages and example for dataset writer (#23419 ) Currently the error message and example refer to a field type that is actually format.	2022-03-28 19:53:12 +01:00
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Max Pumperla	60054995e6	[docs] fix doctests and activate CI (#23418 )	2022-03-24 17:04:02 -07:00
Sven Mika	22c9c4aa39	[RLlib] Slate-Q +GPU torch bug fix. (#23464 )	2022-03-24 17:39:33 +01:00
Avnish Narayan	5134e0dc12	[RLlib] Change type to tensortype for cql policies. (#23438 )	2022-03-24 12:32:29 +01:00
Fabian Witter	2547055f38	[RLlib] Add support for complex observations in CQL (#23332 )	2022-03-22 17:04:07 +01:00

1 2 3 4 5 ...

1117 commits