hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	670d083a56	[RLlib] Fix broken unity3d_env import in example server script. (#13040 )	2020-12-23 11:29:58 -05:00
Sven Mika	d5604eaba3	[RLlib] Attention nets PyTorch support and cleanup (using traj. view API). (#12029 )	2020-12-21 18:38:34 -08:00
Sven Mika	b2bcab711d	[RLlib] Attention Nets: tf (#12753 )	2020-12-20 20:22:32 -05:00
Sven Mika	e40b14d255	[RLlib] Batch-size for truncate_episode batch_mode should be confgurable in agent-steps (rather than env-steps), if needed. (#12420 )	2020-12-08 16:41:45 -08:00
Sven Mika	99c81c6795	[RLlib] Attention Net prep PR #3 . (#12450 )	2020-12-07 13:08:17 +01:00
Sven Mika	3ad9365e1d	[RLlib] Attention Net prep PR #2 : Smaller cleanups. (#12449 )	2020-12-01 08:21:45 +01:00
Sven Mika	0df55a139c	[RLlib] Attention Net prep PR #1 : Smaller cleanups. (#12447 ) * WIP. * Fix. * Fix. * Fix.	2020-11-27 16:25:47 -08:00
Sven Mika	592c161032	[RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. (#12397 ) * WIP. * Fix and LINT.	2020-11-25 11:27:46 -08:00
Sven Mika	841d93d366	[RLlib] Issue 12233 shared tf layers example not really shared (only works for tf1.x, not tf2.x). (#12399 )	2020-11-25 11:27:19 -08:00
Raoul Khouri	d07ffc152b	[rllib] Rrk/12079 custom filters (#12095 ) * travis reformatted	2020-11-19 13:20:20 -08:00
Sven Mika	dab241dcc6	[RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. (#12063 )	2020-11-19 19:01:14 +01:00
Michael Luo	6e6c680f14	MBMPO Cartpole (#11832 ) * MBMPO Cartpole Done * Added doc	2020-11-12 10:30:41 -08:00
Sven Mika	62c7ab5182	[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747 )	2020-11-12 16:27:34 +01:00
Benjamin Black	1999266bba	Updated pettingzoo env to acomidate api changes and fixes (#11873 ) * Updated pettingzoo env to acomidate api changes and fixes * fixed test failure * fixed linting issue * fixed test failure	2020-11-09 16:09:49 -08:00
Pierre TASSEL	66605cfcbd	[RLLib] Random Parametric Trainer (#11366 )	2020-11-04 11:12:51 +01:00
desktable	5af745c90d	[RLlib] Implement the SlateQ algorithm (#11450 )	2020-11-03 09:52:04 +01:00
Lara Codeca	e735add268	[RLlib] Integration with SUMO Simulator (#11710 )	2020-11-03 09:45:03 +01:00
Sven Mika	54d85a6c2a	[RLlib] Fix RNN learning for tf-eager/tf2.x. (#11720 )	2020-11-02 11:18:41 +01:00
Sven Mika	8ea1bc5ff9	[RLlib] Allow for more than 2^31 policy timesteps. (#11301 )	2020-10-12 13:49:11 -07:00
Sven Mika	ce96b03b07	[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033 )	2020-10-06 20:28:16 +02:00
Sven Mika	c17169dc11	[RLlib] Fix all example scripts to run on GPUs. (#11105 )	2020-10-02 23:07:44 +02:00
Sven Mika	36bda8432b	[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056 )	2020-10-01 16:57:10 +02:00
Eric Liang	daa03ba6e6	[rllib] Add execution module to package ref (#10941 ) * add init * add * update	2020-09-21 23:03:06 -07:00
Sven Mika	d7c42d6d92	[RLlib] Unity blogpost final fixes. (#10894 )	2020-09-20 14:13:20 +02:00
Sven Mika	805dad3bc4	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
Benjamin Black	f2408b719c	Fixed PettingZooEnv (#10847 )	2020-09-17 11:28:42 -07:00
Sven Mika	4b278c36fc	[RLlib] Behavioral Cloning (from MARWIL). (#10619 )	2020-09-09 17:33:21 +02:00
Michael Luo	8e613652af	[RLLib] MBMPO Fixes (#10296 )	2020-09-09 09:34:34 +02:00
Sven Mika	28ab797cf5	[RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). (#10544 )	2020-09-06 10:58:00 +02:00
Richard Liaw	551c597312	[tune] API revamp fix (#10518 )	2020-09-05 15:34:53 -07:00
Sven Mika	244aafdcf8	[RLlib] Curiosity enhancements. (#10373 )	2020-09-05 13:14:24 +02:00
Justin Terry	352718610d	Multi-agent Algorithm Documentation Updates (#9722 )	2020-09-03 22:37:46 -07:00
Sven Mika	715ee8dfc9	[RLlib] Issue 10469: Callbacks should receive env idx ... (#10477 )	2020-09-03 17:27:05 +02:00
Sven Mika	ef18893fb5	[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420 )	2020-09-02 14:03:01 +02:00
Michael Luo	4e9888ce2f	[RLlib] Dreamer (#10172 )	2020-08-26 13:24:05 +02:00
Eric Liang	deea1861ab	[rllib] Try fixing torch GPU and masking errors (#10168 )	2020-08-25 18:34:19 -07:00
Benjamin Black	2689fb439c	Fixed pettingzoo env example (#9973 )	2020-08-25 13:22:25 +02:00
Michael Luo	48a39d7cb9	[RLlib] Deepmind Control Suite Examples (#9751 )	2020-08-23 12:53:08 +02:00
Sven Mika	e968b52cb7	[RLlib] Trajectory view API - 03 Fast LSTM + prev actions/rewards (#9950 )	2020-08-21 12:35:16 +02:00
Eric Liang	ca133e2699	[rllib] Remove extra model config kwargs passed incorrectly for Torch models (#10055 )	2020-08-17 11:12:20 -07:00
Olli Huotari	9ff599cbb8	torch policy now includes model.metrics (#10121 ) * torch policy now includes model.metrics * Fixed tests to work with custom metrics * Forgot to run format.sh	2020-08-15 10:43:11 -07:00
Barak Michener	8e76796fd0	ci: Redo `format.sh --all` script & backfill lint fixes (#9956 )	2020-08-07 16:49:49 -07:00
Michael Luo	4d7bd8c892	[RLlib] Implementation of "Model-based Meta Policy Optimization" (MB MPO) (#9409 )	2020-08-02 18:12:09 +02:00
Sven Mika	b0b0463161	[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678 )	2020-07-29 21:15:09 +02:00
Sven Mika	e6ea33a03c	[RLlib] Enhance reward clipping test; add action_clipping tests. (#9684 )	2020-07-28 10:44:54 +02:00
Sven Mika	5dc4b6686e	[RLlib] Implement DQN PyTorch distributional head. (#9589 )	2020-07-25 09:29:24 +02:00
Sven Mika	78dfed2683	[RLlib] Issue 8384: QMIX doesn't learn anything. (#9527 )	2020-07-17 12:14:34 +02:00
Sven Mika	935d8308fb	[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497 )	2020-07-16 14:55:50 +02:00
Sven Mika	617eb8f279	[RLlib] Issue 9402 MARWIL producing nan rewards. (#9429 )	2020-07-14 05:07:16 +02:00
Sven Mika	14160ca58c	[RLlib] Issue #9366 (DQN w/o dueling produces invalid actions). (#9386 )	2020-07-10 12:43:03 +02:00

... 4 5 6 7 8

375 commits