hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Yi Cheng	fd0f967d2e	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)	2022-06-02 20:38:44 -07:00
Sven Mika	e4ceae19ef	[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )	2022-06-02 16:47:05 +02:00
Eric Liang	905258dbc1	Clean up docstyle in python modules and add LINT rule (#25272 )	2022-06-01 11:27:54 -07:00
Avnish Narayan	eaed256d68	[RLlib] Async parallel execution manager. (#24423 )	2022-05-25 17:54:08 +02:00
Sven Mika	44773e810b	[RLlib] DD-PPO Config objects. (#25028 )	2022-05-22 13:05:24 +02:00
Sven Mika	f54557073e	[RLlib] Remove `execution_plan` API code no longer needed. (#24501 )	2022-05-06 12:29:53 +02:00
Avnish Narayan	6e68b6bef9	[RLlib] DD-PPO training iteration fn. (#24118 ) We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096 Co-authored-by: sven1977 <svenmika1977@gmail.com>	2022-04-22 15:22:14 -07:00
Kai Fricke	9f7170e444	Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )" (#24103 ) This reverts commit `a337fd994e`.	2022-04-22 09:58:58 +01:00
Avnish Narayan	a337fd994e	Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )	2022-04-21 17:37:49 +02:00
Avnish Narayan	0ddbce6518	Revert "[RLlib] DD-PPO training iteration fn (#23906 )" (#24030 ) The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does. We'll need to fix the test then re-merge Reverts #23906	2022-04-19 16:43:57 -07:00
Sven Mika	eb54236d13	[RLlib] DD-PPO training iteration fn (#23906 )	2022-04-19 17:55:26 +02:00
Sven Mika	a3d4fc74a6	[RLlib] MARWIL: Move to training_iteration API. (#23798 )	2022-04-11 19:28:32 +02:00
Steven Morad	00922817b6	[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673 )	2022-04-11 08:39:10 +02:00
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
Daniel	308ccfe25c	[RLlib] DD-PPO move `train_batch_size==-1` check to __init__ (#22521 )	2022-02-21 11:44:12 +01:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Sven Mika	49cd7ea6f9	[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). (#20571 )	2021-11-23 23:01:05 +01:00
gjoliver	99a0088233	[RLlib] Unify the way we create local replay buffer for all agents (#19627 ) * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field.	2021-10-26 20:56:02 +02:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	c7563a32ed	[RLlib] DD-PPO not supported on Win (add meaningful error message). (#15631 )	2021-05-04 19:26:17 +02:00
Sven Mika	dab241dcc6	[RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. (#12063 )	2020-11-19 19:01:14 +01:00
Sven Mika	62c7ab5182	[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747 )	2020-11-12 16:27:34 +01:00
Philsik Chang	ede9347127	[rllib] Add torch_distributed_backend flag for DDPPO (#11362 ) (#11425 )	2020-10-21 18:30:42 -07:00
Sven Mika	36bda8432b	[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056 )	2020-10-01 16:57:10 +02:00
Sven Mika	805dad3bc4	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
Sven Mika	ef18893fb5	[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420 )	2020-09-02 14:03:01 +02:00
Sven Mika	d14b501692	[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115 )	2020-08-20 17:05:57 +02:00
Chua Cheow Huan	ea51e94729	[rllib] Learning rate schedule for DDPPO. (#10006 ) * Get shared metrics, increment counter & set global vars for remote workers. * Add unit test to test lr_schedule for DDPPO. * Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.	2020-08-15 00:51:45 -07:00
Sven Mika	2746fc0476	[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520 )	2020-05-27 16:19:13 +02:00
Eric Liang	9a83908c46	[rllib] Deprecate policy optimizers (#8345 )	2020-05-21 10:16:18 -07:00
Eric Liang	9d012626e5	[rllib] Distributed exec workflow for impala (#8321 )	2020-05-11 20:24:43 -07:00
Eric Liang	baadbdf8d4	[rllib] Execute PPO using training workflow (#8206 ) * wip * add kl * kl * works now * doc update * reorg * add ddppo * add stats * fix fetch * comment * fix learner stat regression * test fixes * fix test	2020-04-30 01:18:09 -07:00
Eric Liang	dd70720578	[rllib] Rename sample_batch_size => rollout_fragment_length (#7503 ) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps	2020-03-14 12:05:04 -07:00
Eric Liang	026f6884b5	[rllib] Add Decentralized DDPPO trainer and documentation (#7088 )	2020-02-10 15:28:27 -08:00

39 commits