hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
gjoliver	d81885c1f1	[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809 ) * Fix DDPG, since it is based on GenericOffPolicyTrainer. * Fix QMix, SAC, and MADDPA too. * Undo QMix change. * Fix DQN input batch type. Always use SampleBatch. * apex ddpg should not use replay_buffer_config yet. * Make eager tf policy to use SampleBatch. * lint * LINT. * Re-enable RLlib broken tests to make sure things work ok now. * fixes. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-28 18:06:47 +02:00
Sven Mika	4a82d3ea6c	Revert "[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )" (#19806 ) This reverts commit `80eeb13175`.	2021-10-27 23:30:07 +02:00
Sven Mika	80eeb13175	[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )	2021-10-27 19:15:35 +02:00
gjoliver	99a0088233	[RLlib] Unify the way we create local replay buffer for all agents (#19627 ) * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field.	2021-10-26 20:56:02 +02:00
Sven Mika	b213565783	[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693 )	2021-10-25 15:00:00 +02:00
gjoliver	89fbfc00f8	[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623 )	2021-10-25 09:42:39 +02:00
roireshef	9b0352f363	[RLlib] Added LearningRateSchedule and EntropyCoeffSchedule to TF and Torch versions of A3C and PPO (#19276 )	2021-10-25 09:39:35 +02:00
gjoliver	c3c42278e4	[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652 ) * [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings. * wip	2021-10-25 09:38:56 +02:00
gjoliver	44a4e42172	[rllib] Add entropy_coeff_schedule support for APPO. (#19544 ) * Add entropy_coeff_schedule support for APPO. * lint	2021-10-20 14:18:01 -07:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
Sven Mika	bd2d2079d2	[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf\|torch]) (#19269 )	2021-10-10 12:19:47 +02:00
Sven Mika	d439fd7f17	[RLlib] TF2/eager memory leak fixes. (#19198 )	2021-10-09 00:11:53 +02:00
Sven Mika	fd438d5630	[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. (#19133 )	2021-10-07 22:39:21 +02:00
Sven Mika	1f0646f658	[RLlib] Issue 18418: SAC w/ dict space not working. (#19101 )	2021-10-06 09:05:50 +02:00
Sven Mika	b4300dd532	[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937 )	2021-10-04 13:29:00 +02:00
Sven Mika	73f5c4039b	[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035 )	2021-10-04 13:23:51 +02:00
Jiajun Yao	7588bfd315	[Lint] Add flake8-bugbear (#19053 ) * Add flake8-bugbear * Add flake8-bugbear	2021-10-03 23:24:11 -07:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Sven Mika	828f5d26b7	[RLlib] Custom view requirements (e.g. for prev-n-obs) work with `compute_single_action` and `compute_actions_from_input_dict`. (#18921 )	2021-09-30 15:03:37 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	9c9b482661	[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939 )	2021-09-29 21:31:34 +02:00
Sven Mika	b99943806e	[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971 )	2021-09-29 21:30:04 +02:00
Sven Mika	61a1274619	[RLlib] No Preprocessors (part 2). (#18468 )	2021-09-23 12:56:45 +02:00
Sven Mika	a2a077b874	[RLlib] Faster remote worker space inference (don't infer if not required). (#18805 )	2021-09-23 10:54:37 +02:00
Sven Mika	698b4eeed3	[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669 )	2021-09-21 22:00:14 +02:00
Sven Mika	fd13bac9b3	[RLlib] Add `worker` arg (optional) to `policy_mapping_fn`. (#18184 )	2021-09-17 12:07:11 +02:00
Sven Mika	ba1c489b79	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
Sven Mika	8a00154038	[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544 )	2021-09-15 08:46:37 +02:00
Sven Mika	08c09737fa	[RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550 )	2021-09-14 19:58:10 +02:00
Sven Mika	3803e796ff	[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. (#18540 )	2021-09-13 19:27:53 +02:00
Sven Mika	ea4a22249c	[RLlib] Add simple action-masking example script/env/model (tf and torch). (#18494 )	2021-09-11 23:08:09 +02:00
Sven Mika	3f89f35e52	[RLlib] Better error messages and hints; + failure-mode tests; (#18466 )	2021-09-10 16:52:47 +02:00
Sven Mika	8a066474d4	[RLlib] No Preprocessors; preparatory PR #1 (#18367 )	2021-09-09 08:10:42 +02:00
Sven Mika	1520c3d147	[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to `Trainer.add_policy()` (#18428 )	2021-09-09 07:10:06 +02:00
gjoliver	808b683f81	[RLlib] Add a unittest for learning rate schedule used with APEX agent. (#18389 )	2021-09-08 23:29:40 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	56f142cac1	[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380 )	2021-09-07 08:08:37 +02:00
Sven Mika	e3e6ed7aaa	[RLlib] Issues 17844, 18034: Fix n-step > 1 bug. (#18358 )	2021-09-06 12:14:20 +02:00
Sven Mika	ba58f5edb1	[RLlib] Strictly run `evaluation_num_episodes` episodes each evaluation run (no matter the other eval config settings). (#18335 )	2021-09-05 15:37:05 +02:00
Sven Mika	a772c775cd	[RLlib] Set random seed (if provided) to Trainer process as well. (#18307 )	2021-09-04 11:02:30 +02:00
Sven Mika	9a8ca6a69d	[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306 )	2021-09-03 13:29:57 +02:00
Sven Mika	82465f9342	[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. (#18254 )	2021-08-31 22:03:23 +02:00
Sven Mika	599e589481	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
Sven Mika	4888d7c9af	[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999 )	2021-08-31 12:21:49 +02:00
Joseph Suarez	8136d2912b	[RLlib] Add `policies` arg to callback: `on_episode_step` (already exists in all other episode-related callbacks) (#18119 )	2021-08-27 16:12:19 +02:00
Sven Mika	b6aa8223bc	[RLlib] Fix `final_scale`'s default value to 0.02 (see OrnsteinUhlenbeck exploration). (#18070 )	2021-08-25 14:22:09 +02:00
Sven Mika	9883505e84	[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). (#18017 )	2021-08-24 21:55:27 +02:00
Sven Mika	494ddd98c1	[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928 )	2021-08-21 17:05:48 +02:00
Sven Mika	8248ba531b	[RLlib] Redo #17410 : Example script: Remote worker envs with inference done on main node. (#17960 )	2021-08-20 08:02:18 +02:00
Alex Wu	318ba6fae0	Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. (#17410 )" (#17951 ) This reverts commit `8fc16b9a18`.	2021-08-19 07:55:10 -07:00

1 2 3 4 5 ...

467 commits