hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Avnish Narayan	5134e0dc12	[RLlib] Change type to tensortype for cql policies. (#23438 )	2022-03-24 12:32:29 +01:00
Fabian Witter	2547055f38	[RLlib] Add support for complex observations in CQL (#23332 )	2022-03-22 17:04:07 +01:00
Jun Gong	d12977c4fb	[RLlib] TF2 Bandit Agent (#22838 )	2022-03-21 16:55:55 +01:00
Sven Mika	b1cda46681	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
Fabien Couthouis	e575ed3350	[RLlib] Fix AttributeError with None obs shape + tf in `_unpack_obs()` utility (#22428 )	2022-03-15 16:34:31 +01:00
Jeroen Bédorf	bc21a4593d	[RLlib] Fix crash when kl_coeff is set to 0 (#23063 ) Co-authored-by: Jeroen Bédorf <jeroen@minds.ai> Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com> Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>	2022-03-11 12:24:52 -08:00
simonsays1980	8627f44d7f	[RLlib] Remove duplicate code block: Config deprecation check for `metrics_smoothing_episodes` (#22152 )	2022-03-09 16:51:42 +01:00
Artur Niederfahrenhorst	37d129a965	[RLlib] ReplayBuffer API: Test cases. (#22390 )	2022-03-08 16:54:12 +01:00
Artur Niederfahrenhorst	c0ade5f0b7	[RLlib] Issue 22625: `MultiAgentBatch.timeslices()` does not behave as expected. (#22657 )	2022-03-08 14:25:48 +01:00
Jiajun Yao	4801e57c77	[Test] Add missing tests to bazel BUILD (#22827 )	2022-03-07 19:54:49 -08:00
Sven Mika	3fe6f3b3eb	[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821 )	2022-03-04 19:16:30 +01:00
Jun Gong	e765915ded	[RLlib] Make sure SlateQ works with GPU. (#22738 )	2022-03-04 17:49:51 +01:00
Kai Fricke	84a163a2c4	[RLlib] Remove atari rom install script (#22797 )	2022-03-03 16:55:56 +01:00
Sven Mika	0af100ffae	[RLlib] Fix tree.flatten dict ordering bug: `flatten_space([obs_space])` should produce same struct as `tree.flatten([obs])`. (#22731 )	2022-03-01 21:24:24 +01:00
Sven Mika	e50bd212a1	[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686 )	2022-03-01 16:44:17 +01:00
Daniel	8d1f1b0a64	[RLlib] Update pettingzoo==1.15.0 supersuit==3.3.3 (#22519 )	2022-03-01 11:23:27 +01:00
simonsays1980	568cf28dd4	[RLlib] Example script `custom_metrics_and_callbacks.py` should work for `batch_mode=complete_episodes`. (#22684 )	2022-03-01 09:00:38 +01:00
Jun Gong	e8be45065e	[RLlib] Restore policies on `eval_workers` as well. (#22641 )	2022-03-01 08:38:14 +01:00
Jun Gong	22bc451102	[RLlib] Fix a memeory leak in SimpleReplyBuffer that completely kills sampling throughput (#22678 )	2022-02-28 09:28:04 +01:00
Sven Mika	7b687e6cd8	[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544 )	2022-02-25 21:58:16 +01:00
Jun Gong	a385c9b127	[RLlib] Update bandit_envs_recommender_system (#22421 )	2022-02-24 22:43:41 +01:00
Sven Mika	526fd6b5fb	[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590 )	2022-02-24 22:05:36 +01:00
Sven Mika	18c269c70e	[RLlib] Issue 22539: agent_key not deleted from 2 dicts in simple list collector. (#22587 )	2022-02-24 11:58:34 +01:00
Sven Mika	8e00537b65	[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543 )	2022-02-23 13:03:45 +01:00
Xuehai Pan	018ebbf4cb	[RLlib] Issue #21671 : Handle callbacks and model metrics for `TorchPolicy` while using multi-GPU optimizers (#21697 )	2022-02-23 08:30:38 +01:00
Sven Mika	6522935291	[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389 )	2022-02-22 09:36:44 +01:00
Jun Gong	2b6a0c71d7	[RLlib] Add a callback for when trainer finishes initialization: `on_trainer_init`. (#22493 )	2022-02-22 08:18:32 +01:00
Steven Morad	d4571741aa	[RLlib] `seq_lens` should always be torch tensors. (#22398 )	2022-02-22 08:15:43 +01:00
JYX	49d7ba3738	[RLlib] Fix typo in vector_env docstring (#22534 )	2022-02-22 08:13:50 +01:00
Daniel	308ccfe25c	[RLlib] DD-PPO move `train_batch_size==-1` check to __init__ (#22521 )	2022-02-21 11:44:12 +01:00
Sven Mika	c58cd90619	[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465 )	2022-02-17 22:32:26 +01:00
Avnish Narayan	740def0a13	[RLlib] Put env-checker on critical path. (#22191 )	2022-02-17 14:06:14 +01:00
Sven Mika	5ca6a56e16	[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. (#22350 )	2022-02-15 09:32:43 +01:00
Jun Gong	6f5afcbce9	[RLlib] Docs enhancements: Setup-dev instructions; Ray datasets integration. (#22239 )	2022-02-15 09:09:24 +01:00
Steven Morad	5d52b599aa	[RLlib] Fix zero gradients for ppo-clipped vf (#22171 )	2022-02-15 08:57:18 +01:00
Sven Mika	04a5c72ea3	Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708 )	2022-02-10 13:44:22 +01:00
Balaji Veeramani	abad268549	Comment `fmt: off` annotations (#21984 ) Code formatting is disabled in several modules with the explanation > [The module] ignores yapf because yapf doesn't allow comments right after code blocks, but we put comments right after code blocks to prevent large white spaces in the documentation. Since we no longer use YAPF, it may be possible to re-enable code formatting on these modules. I've added "FIXME" comments requesting developers to check whether code formatter appeasements are still necessary.	2022-02-09 22:12:11 -08:00
Sven Mika	1c791b71d8	[RLlib] Fix Unity3D built-in examples action bounds from -inf/inf to -1.0/1.0. (#22247 )	2022-02-10 03:00:30 +01:00
Sven Mika	44d09c2aa5	[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246 )	2022-02-10 02:58:43 +01:00
Sven Mika	637cacedc9	[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. (#22245 )	2022-02-10 02:58:09 +01:00
xwjiang2010	fc88b0895e	[tune] fix //rllib:tests/test_placement_groups (#22256 )	2022-02-09 14:42:31 -08:00
Alex Wu	b122f093c1	Revert "[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test." (#22250 ) Reverts ray-project/ray#22126 Breaks rllib:tests/test_io	2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst	dea3574050	[RLlib] Replay Buffer API (#22114 )	2022-02-09 15:04:43 +01:00
Jun Gong	3207f537cc	[RLlib] RecSim Interest evolution environment should use custom video sampler: `IEvVideoSampler` due to only one cluster being used. (#22211 )	2022-02-09 10:29:35 +01:00
Ishant Mrinal	f0d8b6d701	[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078 )	2022-02-09 09:05:26 +01:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Sven Mika	ac3e6ab411	[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. (#22126 )	2022-02-08 19:04:13 +01:00
Sven Mika	c17a44cdfa	Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153 )	2022-02-08 16:43:00 +01:00
Sven Mika	8b678ddd68	[RLlib] Issue 22036: Client should handle concurrent episodes with one being `training_enabled=False`. (#22076 )	2022-02-06 12:35:03 +01:00

1 2 3 4 5 ...

1069 commits