hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
Artur Niederfahrenhorst	0dceddb912	[RLlib] Move learning_starts logic from buffers into `training_step()`. (#26032 )	2022-08-11 13:07:30 +02:00
Avnish Narayan	a322ac463c	[RLlib] Make JSONReader default, users will have to use the DatasetReader for any speedups. (#26541 )	2022-07-14 17:19:38 +02:00
Avnish Narayan	1243ed62bf	[RLlib] Make Dataset reader default reader and enable CRR to use dataset (#26304 ) Co-authored-by: avnish <avnish@avnishs-MBP.local.meter>	2022-07-08 12:43:35 -07:00
Sven Mika	be1042429d	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
Sven Mika	7c39aa5fac	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
Steven Morad	501d932449	[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059 )	2022-05-22 19:58:47 +02:00
Artur Niederfahrenhorst	fb2915d26a	[RLlib] Replay Buffer API and Ape-X. (#24506 )	2022-05-17 13:43:49 +02:00
Artur Niederfahrenhorst	8d906f9bf8	[RLlib] SAC with new Replay Buffer API. (#24156 )	2022-05-09 14:33:02 +02:00
Sven Mika	3052193c9e	[RLlib] Fix CQL getting stuck when deprecated `timesteps_per_iteration` is used (use `min_train_timesteps_per_reporting` instead). (#24345 ) Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting. If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).	2022-04-29 21:02:34 +01:00
gjoliver	e7f9e8ceec	[RLlib] Report total_train_steps correctly for offline agents like CQL. (#20541 ) * Fix trainer timestep reporting for offline agents like CQL. * wip. * extend timesteps_total to 200K for learning_tests_pendulum_cql test Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-22 21:46:45 +01:00
Avnish Narayan	026bf01071	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-03 16:24:00 +01:00
Sven Mika	cabaa3b3c6	[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381 )	2021-09-07 11:48:41 +02:00
Sven Mika	90b21ce27e	[RLlib] De-flake 3 test cases; Fix `config.simple_optimizer` and `SampleBatch.is_training` warnings. (#17321 )	2021-07-27 14:39:06 -04:00
Sven Mika	53206dd440	[RLlib] CQL BC loss fixes; PPO/PG/A2\|3C action normalization fixes (#16531 )	2021-06-30 12:32:11 +02:00
Sven Mika	839fc59224	[RLlib] CQL TensorFlow support (#15841 )	2021-05-18 11:10:46 +02:00
Michael Luo	4cbe13cdfd	[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603 ) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-05-04 19:06:19 +02:00
Michael Luo	ec2c10309b	[RLlib] CQL for HalfCheetah-Random-v0 + Hopper-Random-v0 + CQL Bug Fixes (#14243 )	2021-02-22 17:30:18 +01:00
Michael Luo	587f207c2f	[RLlib] Support for D4RL + Semi-working CQL Benchmark (#13550 )	2021-01-21 16:43:55 +01:00
Michael Luo	42cd414e5b	[RLlib] New Offline RL Algorithm: CQL (based on SAC) (#13118 )	2020-12-30 10:11:57 -05:00

19 commits