hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	73f5c4039b	[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035 )	2021-10-04 13:23:51 +02:00
Jiajun Yao	7588bfd315	[Lint] Add flake8-bugbear (#19053 ) * Add flake8-bugbear * Add flake8-bugbear	2021-10-03 23:24:11 -07:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Sven Mika	828f5d26b7	[RLlib] Custom view requirements (e.g. for prev-n-obs) work with `compute_single_action` and `compute_actions_from_input_dict`. (#18921 )	2021-09-30 15:03:37 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	9c9b482661	[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939 )	2021-09-29 21:31:34 +02:00
Sven Mika	b99943806e	[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971 )	2021-09-29 21:30:04 +02:00
Sven Mika	61a1274619	[RLlib] No Preprocessors (part 2). (#18468 )	2021-09-23 12:56:45 +02:00
Sven Mika	a2a077b874	[RLlib] Faster remote worker space inference (don't infer if not required). (#18805 )	2021-09-23 10:54:37 +02:00
Sven Mika	698b4eeed3	[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669 )	2021-09-21 22:00:14 +02:00
Sven Mika	fd13bac9b3	[RLlib] Add `worker` arg (optional) to `policy_mapping_fn`. (#18184 )	2021-09-17 12:07:11 +02:00
Sven Mika	ba1c489b79	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
Sven Mika	8a00154038	[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544 )	2021-09-15 08:46:37 +02:00
Sven Mika	08c09737fa	[RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550 )	2021-09-14 19:58:10 +02:00
Sven Mika	3803e796ff	[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. (#18540 )	2021-09-13 19:27:53 +02:00
Sven Mika	ea4a22249c	[RLlib] Add simple action-masking example script/env/model (tf and torch). (#18494 )	2021-09-11 23:08:09 +02:00
Sven Mika	3f89f35e52	[RLlib] Better error messages and hints; + failure-mode tests; (#18466 )	2021-09-10 16:52:47 +02:00
Sven Mika	8a066474d4	[RLlib] No Preprocessors; preparatory PR #1 (#18367 )	2021-09-09 08:10:42 +02:00
Sven Mika	1520c3d147	[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to `Trainer.add_policy()` (#18428 )	2021-09-09 07:10:06 +02:00
gjoliver	808b683f81	[RLlib] Add a unittest for learning rate schedule used with APEX agent. (#18389 )	2021-09-08 23:29:40 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	56f142cac1	[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380 )	2021-09-07 08:08:37 +02:00
Sven Mika	e3e6ed7aaa	[RLlib] Issues 17844, 18034: Fix n-step > 1 bug. (#18358 )	2021-09-06 12:14:20 +02:00
Sven Mika	ba58f5edb1	[RLlib] Strictly run `evaluation_num_episodes` episodes each evaluation run (no matter the other eval config settings). (#18335 )	2021-09-05 15:37:05 +02:00
Sven Mika	a772c775cd	[RLlib] Set random seed (if provided) to Trainer process as well. (#18307 )	2021-09-04 11:02:30 +02:00
Sven Mika	9a8ca6a69d	[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306 )	2021-09-03 13:29:57 +02:00
Sven Mika	82465f9342	[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. (#18254 )	2021-08-31 22:03:23 +02:00
Sven Mika	599e589481	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
Sven Mika	4888d7c9af	[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999 )	2021-08-31 12:21:49 +02:00
Joseph Suarez	8136d2912b	[RLlib] Add `policies` arg to callback: `on_episode_step` (already exists in all other episode-related callbacks) (#18119 )	2021-08-27 16:12:19 +02:00
Sven Mika	b6aa8223bc	[RLlib] Fix `final_scale`'s default value to 0.02 (see OrnsteinUhlenbeck exploration). (#18070 )	2021-08-25 14:22:09 +02:00
Sven Mika	9883505e84	[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). (#18017 )	2021-08-24 21:55:27 +02:00
Sven Mika	494ddd98c1	[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928 )	2021-08-21 17:05:48 +02:00
Sven Mika	8248ba531b	[RLlib] Redo #17410 : Example script: Remote worker envs with inference done on main node. (#17960 )	2021-08-20 08:02:18 +02:00
Alex Wu	318ba6fae0	Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. (#17410 )" (#17951 ) This reverts commit `8fc16b9a18`.	2021-08-19 07:55:10 -07:00
Sven Mika	8fc16b9a18	[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. (#17410 )	2021-08-19 12:14:50 +02:00
Kai Fricke	bf3eaa9264	[RLlib] Dreamer fixes and reinstate Dreamer test. (#17821 ) Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-08-18 18:47:08 +02:00
Sven Mika	a428f10ebe	[RLlib] Add multi-GPU learning tests to nightly. (#17778 )	2021-08-18 17:21:01 +02:00
Sven Mika	f18213712f	[RLlib] Redo: "fix self play example scripts" PR (17566) (#17895 ) * wip. * wip. * wip. * wip. * wip. * wip. * wip. * wip. * wip.	2021-08-17 09:13:35 -07:00
Thomas Lecat	c02f91fa2d	[RLlib] Ape-X doesn't take the value of `prioritized_replay` into account (#17541 )	2021-08-16 22:18:08 +02:00
Sven Mika	f3bbe4ea44	[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640 )	2021-08-16 22:01:01 +02:00
Sven Mika	c2ea2c01bb	[RLlib] Redo: Add support for multi-GPU to DDPG. (#17789 ) * wip. * wip. * wip. * wip. * wip. * wip.	2021-08-13 18:01:24 -07:00
Sven Mika	7f2b3c0824	[RLlib] Issue 17667: CQL-torch + GPU not working (due to simple_optimizer=False; must use simple optimizer!). (#17742 )	2021-08-11 18:30:21 +02:00
Sven Mika	811d71b368	[RLlib] Issue 17653: Torch multi-GPU (>1) broken for LSTMs. (#17657 )	2021-08-11 12:44:35 +02:00
Amog Kamsetty	0b8489dcc6	Revert "[RLlib] Add support for multi-GPU to DDPG. (#17586 )" (#17707 ) This reverts commit `0eb0e0ff58`.	2021-08-10 10:50:21 -07:00
Amog Kamsetty	77f28f1c30	Revert "[RLlib] Fix `Trainer.add_policy` for num_workers>0 (self play example scripts). (#17566 )" (#17709 ) This reverts commit `3b447265d8`.	2021-08-10 10:50:01 -07:00
Sven Mika	3b447265d8	[RLlib] Fix `Trainer.add_policy` for num_workers>0 (self play example scripts). (#17566 )	2021-08-05 11:41:18 -04:00
Sven Mika	0eb0e0ff58	[RLlib] Add support for multi-GPU to DDPG. (#17586 )	2021-08-05 11:39:51 -04:00
Sven Mika	5107d16ae5	[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. (#17530 )	2021-08-03 18:30:02 -04:00
Sven Mika	924f11cd45	[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371 )	2021-08-03 11:35:49 -04:00

1 2 3 4 5 ...

452 commits