ray/rllib/tuned_examples at f421730b4796ad3871961b573c0cba106f8a21ee - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

History

kourosh hakhamaneshi f421730b47 [RLlib] Added `expectation` advantage_type option to CRR. (#26142 )		2022-06-28 15:40:09 +02:00
..
a2c	[RLlib] Unflake some CI-tests. (#25313 )	2022-06-03 14:51:50 +02:00
a3c	[RLlib] A2C + A3C move to `algorithms` folder and re-name into A2C/A3C (from ...Trainer). (#25314 )	2022-06-01 09:29:16 +02:00
alpha_star	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00
alpha_zero	[RLlib] AlphaZero uses training_iteration API. (#24507 )	2022-05-18 09:58:25 +02:00
apex_ddpg	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
apex_dqn	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
appo	[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848 )	2022-06-17 14:10:36 +02:00
ars	[RLlib] Make sure torch and tf behave the same wrt conv2d nets. (#8785 )	2020-06-20 00:05:19 +02:00
bandits	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
bc	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
cql	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
crr	[RLlib] Added `expectation` advantage_type option to CRR. (#26142 )	2022-06-28 15:40:09 +02:00
ddpg	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
ddppo	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
dqn	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
dreamer	[RLlib] Dreamer (#10172 )	2020-08-26 13:24:05 +02:00
es	[RLlib] 2 RLlib Flaky Tests (#14930 )	2021-03-30 19:21:13 +02:00
impala	[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848 )	2022-06-17 14:10:36 +02:00
maddpg	[RLlib] MADDPG: Move into main `algorithms` folder and add proper unit and learning tests. (#24579 )	2022-05-24 12:53:53 +02:00
maml	[RLLib] MAML extension for all models except RNNs (#11337 )	2020-11-12 16:51:40 -08:00
marwil	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
mbmpo	MBMPO Cartpole (#11832 )	2020-11-12 10:30:41 -08:00
pg	[RLlib] Algorithm `step()` fixes: evaluation should NOT be part of timed `training_step` loop. (#25924 )	2022-06-20 19:53:47 +02:00
ppo	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
qmix	[RLlib] QMIX better defaults + added to CI learning tests (#21332 )	2022-01-04 08:54:41 +01:00
r2d2	[RLlib] Better default values for `training_intensity` and `target_network_update_freq` for R2D2. (#25510 )	2022-06-07 10:29:56 +02:00
sac	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
simple_q	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
slateq	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
td3	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
cleanup_experiment.py	[CI] Format Python code with Black (#21975 )	2022-01-29 18:41:57 -08:00
compact-regression-test.yaml	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
create_plots.py	[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414 )	2020-05-26 11:10:27 +02:00