hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	c7563a32ed	[RLlib] DD-PPO not supported on Win (add meaningful error message). (#15631 )	2021-05-04 19:26:17 +02:00
Sven Mika	dab241dcc6	[RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. (#12063 )	2020-11-19 19:01:14 +01:00
Sven Mika	62c7ab5182	[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747 )	2020-11-12 16:27:34 +01:00
Philsik Chang	ede9347127	[rllib] Add torch_distributed_backend flag for DDPPO (#11362 ) (#11425 )	2020-10-21 18:30:42 -07:00
Sven Mika	36bda8432b	[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056 )	2020-10-01 16:57:10 +02:00
Sven Mika	805dad3bc4	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
Sven Mika	ef18893fb5	[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420 )	2020-09-02 14:03:01 +02:00
Sven Mika	d14b501692	[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115 )	2020-08-20 17:05:57 +02:00
Chua Cheow Huan	ea51e94729	[rllib] Learning rate schedule for DDPPO. (#10006 ) * Get shared metrics, increment counter & set global vars for remote workers. * Add unit test to test lr_schedule for DDPPO. * Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.	2020-08-15 00:51:45 -07:00
Sven Mika	2746fc0476	[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520 )	2020-05-27 16:19:13 +02:00
Eric Liang	9a83908c46	[rllib] Deprecate policy optimizers (#8345 )	2020-05-21 10:16:18 -07:00
Eric Liang	9d012626e5	[rllib] Distributed exec workflow for impala (#8321 )	2020-05-11 20:24:43 -07:00
Eric Liang	baadbdf8d4	[rllib] Execute PPO using training workflow (#8206 ) * wip * add kl * kl * works now * doc update * reorg * add ddppo * add stats * fix fetch * comment * fix learner stat regression * test fixes * fix test	2020-04-30 01:18:09 -07:00
Eric Liang	dd70720578	[rllib] Rename sample_batch_size => rollout_fragment_length (#7503 ) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps	2020-03-14 12:05:04 -07:00
Eric Liang	026f6884b5	[rllib] Add Decentralized DDPPO trainer and documentation (#7088 )	2020-02-10 15:28:27 -08:00

17 commits