hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Steven Morad	39841b65b3	[RLlib] PPOTorchPolicy: Remove extra call to `model.value_function` (#23671 )	2022-04-05 08:40:29 +02:00
Sven Mika	2eaa54bd76	[RLlib] POC: Config objects instead of dicts (PPO only). (#23491 )	2022-03-31 18:26:12 +02:00
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
Jeroen Bédorf	bc21a4593d	[RLlib] Fix crash when kl_coeff is set to 0 (#23063 ) Co-authored-by: Jeroen Bédorf <jeroen@minds.ai> Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com> Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>	2022-03-11 12:24:52 -08:00
Sven Mika	526fd6b5fb	[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590 )	2022-02-24 22:05:36 +01:00
Daniel	308ccfe25c	[RLlib] DD-PPO move `train_batch_size==-1` check to __init__ (#22521 )	2022-02-21 11:44:12 +01:00
Steven Morad	5d52b599aa	[RLlib] Fix zero gradients for ppo-clipped vf (#22171 )	2022-02-15 08:57:18 +01:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Jun Gong	8ebc50f844	[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855 )	2022-01-27 20:08:58 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
Sven Mika	92f030331e	[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420 )	2022-01-10 11:22:55 +01:00
Sven Mika	4eaf70942d	[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456 )	2022-01-10 11:22:40 +01:00
Sven Mika	853d10871c	[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376 )	2022-01-05 18:22:33 +01:00
Sven Mika	49cd7ea6f9	[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). (#20571 )	2021-11-23 23:01:05 +01:00
Sven Mika	9d2fe5756c	[RLlib] Trainer sub-class for APPO (instead of using `build_trainer()`). (#20424 )	2021-11-22 22:14:21 +01:00
Sven Mika	f82880eda1	Revert "Revert [RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 ) (#20399 )" (#20417 ) This reverts commit `90dc5460d4`.	2021-11-16 14:49:41 +01:00
Amog Kamsetty	90dc5460d4	Revert "[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 )" (#20399 ) This reverts commit `5b1c8e46e1`.	2021-11-15 16:11:35 -08:00
Sven Mika	5b1c8e46e1	[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy (#20061 )	2021-11-15 10:41:54 +01:00
Sven Mika	a931076f59	[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981 )	2021-11-05 16:10:00 +01:00
Avnish Narayan	026bf01071	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-03 16:24:00 +01:00
Sven Mika	e6ae08f416	[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601 )	2021-11-03 10:01:34 +01:00
Sven Mika	cf21c634a3	[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982 )	2021-11-03 10:00:46 +01:00
gjoliver	9385b6c1be	[RLlib] Make a few LRSchedule and EntropyCoeffSchedule tests more reliable. (#19934 )	2021-11-02 16:52:56 +01:00
Sven Mika	0b308719f8	[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829 )	2021-11-01 21:46:02 +01:00
Sven Mika	9c73871da0	[RLlib; Docs overhaul] Docstring cleanup: Evaluation (#19783 )	2021-10-29 12:03:56 +02:00
gjoliver	99a0088233	[RLlib] Unify the way we create local replay buffer for all agents (#19627 ) * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field.	2021-10-26 20:56:02 +02:00
Sven Mika	b213565783	[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693 )	2021-10-25 15:00:00 +02:00
gjoliver	89fbfc00f8	[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623 )	2021-10-25 09:42:39 +02:00
gjoliver	44a4e42172	[rllib] Add entropy_coeff_schedule support for APPO. (#19544 ) * Add entropy_coeff_schedule support for APPO. * lint	2021-10-20 14:18:01 -07:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
Sven Mika	bd2d2079d2	[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf\|torch]) (#19269 )	2021-10-10 12:19:47 +02:00
Sven Mika	d439fd7f17	[RLlib] TF2/eager memory leak fixes. (#19198 )	2021-10-09 00:11:53 +02:00
Sven Mika	b4300dd532	[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937 )	2021-10-04 13:29:00 +02:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	698b4eeed3	[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669 )	2021-09-21 22:00:14 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	599e589481	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
Sven Mika	494ddd98c1	[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928 )	2021-08-21 17:05:48 +02:00
Sven Mika	a428f10ebe	[RLlib] Add multi-GPU learning tests to nightly. (#17778 )	2021-08-18 17:21:01 +02:00
Sven Mika	f3bbe4ea44	[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640 )	2021-08-16 22:01:01 +02:00
Sven Mika	5107d16ae5	[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. (#17530 )	2021-08-03 18:30:02 -04:00
Sven Mika	924f11cd45	[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371 )	2021-08-03 11:35:49 -04:00
Sven Mika	5a313ba3d6	[RLlib] Refactor: All tf static graph code should reside inside Policy class. (#17169 )	2021-07-20 14:58:13 -04:00
Sven Mika	649580d735	[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046 )	2021-07-15 05:51:24 -04:00
Amog Kamsetty	38b5b6d24c	Revert "[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565 )" (#17036 ) This reverts commit `e4123fff27`.	2021-07-13 09:57:15 -07:00
Sven Mika	e4123fff27	[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565 )	2021-07-13 06:38:14 -04:00

1 2 3 4

153 commits