Commit graph

394 commits

Author SHA1 Message Date
Julius Frost
0b1b6222bc
[rllib] Add merge_trainer_config arguments to trainer template (#17160) 2021-07-21 15:43:06 -07:00
Sven Mika
5a313ba3d6
[RLlib] Refactor: All tf static graph code should reside inside Policy class. (#17169) 2021-07-20 14:58:13 -04:00
Amog Kamsetty
cb74053ee5
Retry remove gpustat dependency (#17115)
* remove gpustat

* move psutil imports
2021-07-19 11:14:10 -07:00
Sven Mika
18d173b172
[RLlib] Implement policy_maps (multi-agent case) in RolloutWorkers as LRU caches. (#17031) 2021-07-19 13:16:03 -04:00
Sven Mika
e0640ad0dc
[RLlib] Fix seeding for ES and ARS. (#16744) 2021-07-19 13:13:05 -04:00
Sven Mika
649580d735
[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046) 2021-07-15 05:51:24 -04:00
Grzegorz Bartyzel
d553d4da6c
[RLlib] DQN (Rainbow): Fix torch noisy layer support and loss (#16716) 2021-07-13 16:48:06 -04:00
Sven Mika
1fd0eb805e
[RLlib] Redo fix bug normalize vs unsquash actions (original PR made log-likelihood test flakey). (#17014) 2021-07-13 14:01:30 -04:00
Antoine Galataud
16f1011c07
[RLlib] Issue 15910: APEX current learning rate not updated on local worker (#15911) 2021-07-13 14:01:00 -04:00
Amog Kamsetty
38b5b6d24c
Revert "[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565)" (#17036)
This reverts commit e4123fff27.
2021-07-13 09:57:15 -07:00
Kai Fricke
27d80c4c88
[RLlib] ONNX export for tensorflow (1.x) and torch (#16805) 2021-07-13 12:38:11 -04:00
Sven Mika
e4123fff27
[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565) 2021-07-13 06:38:14 -04:00
Amog Kamsetty
bc33dc7e96
Revert "[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action." (#17002)
This reverts commit 7862dd64ea.
2021-07-12 11:09:14 -07:00
Sven Mika
55a90e670a
[RLlib] Trainer.add_policy() not working for tf, if added policy is trained afterwards. (#16927) 2021-07-11 23:41:38 +02:00
Julius Frost
a88b217d3f
[rllib] Enhancements to Input API for customizing offline datasets (#16957)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-07-10 15:05:25 -07:00
Sven Mika
7862dd64ea
[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action. (#16774) 2021-07-08 17:31:34 +02:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes (#16531) 2021-06-30 12:32:11 +02:00
AnnaKosiorek
1e709771b2
[rllib][minor] clarification of the softmax axis in dqn_torch_policy (#16311)
pytorch nn.functional.softmax (unlike tf.nn.softmax) calculates softmax along zeroth dimension by default
2021-06-26 11:19:54 -07:00
Sven Mika
c95dea51e9
[RLlib] External env enhancements + more examples. (#16583) 2021-06-23 09:09:01 +02:00
Sven Mika
be6db06485
[RLlib] Re-do: Trainer: Support add and delete Policies. (#16569) 2021-06-21 13:46:01 +02:00
Sven Mika
169ddabae7
[RLlib] Issue 15973: Trainer.with_updates(validate_config=...) behaves confusingly. (#16429) 2021-06-19 22:42:00 +02:00
Amog Kamsetty
bd3cbfc56a
Revert "[RLlib] Allow policies to be added/deleted on the fly. (#16359)" (#16543)
This reverts commit e78ec370a9.
2021-06-18 12:21:49 -07:00
Sven Mika
2900a06dd7
[RLlib] Issue 14503: SAC not allowing custom action distributions. (#16427) 2021-06-18 17:27:29 +02:00
Sven Mika
e78ec370a9
[RLlib] Allow policies to be added/deleted on the fly. (#16359) 2021-06-18 10:31:30 +02:00
Sven Mika
d0014cd351
[RLlib] Policies get/set_state fixes and enhancements. (#16354) 2021-06-15 13:08:43 +02:00
Chris Bamford
fd1a97e39f
[RLlib] Memory leak docs (#15908) 2021-06-10 18:10:21 +02:00
Sven Mika
3d4dc60e2e
[RLlib] CQL iteration count fixes: Remove dummy buffer and unnecessary store op from exec_plan. (#16332) 2021-06-10 07:49:17 +02:00
Sven Mika
e2be41b407
[RLlib] MARWIL + BC: Various fixes and enhancements. (#16218) 2021-06-03 22:29:00 +02:00
Sven Mika
5fe34862ce
[RLlib] DDPG torch GPU bug. (#16133) 2021-05-28 22:09:25 +02:00
Sven Mika
33a69135cb
[RLlib] Issue 16117: DQN/APEX torch not working on GPU. (#16118) 2021-05-28 09:12:53 +02:00
Sven Mika
f6302d81be
[RLlib] Discussion 2210: BC algo broken, if "advantages" missing in offline data. (#16019) 2021-05-25 08:47:17 +02:00
Sven Mika
e80095591c
[RLlib] Entropy coeff schedule bug fix and git bisect script. (#15937) 2021-05-20 18:15:10 +02:00
Sven Mika
2d34216660
[RLlib] APEX-DQN: Bug fix for torch and add learning test. (#15762) 2021-05-20 09:27:03 +02:00
Sven Mika
eaa7f6696d
[RLlib] Issue 15887: MARWIL adv norm update mismatch for tf (static-graph) vs torch versions. (#15898) 2021-05-19 15:44:11 -07:00
Michael Luo
474f04e322
[RLlib] DDPG/TD3 + A3C/A2C + MARWIL/BC Annotation/Comments/Code Cleanup (#14707) 2021-05-19 16:32:29 +02:00
Chris Bamford
0be83d9a95
[RLlib] Fixing Memory Leak In Multi-Agent environments. Adding tooling for finding memory leaks in workers. (#15815) 2021-05-18 13:23:00 +02:00
Sven Mika
2303851c3c
[RLlib] Torch multi-GPU + LSTM/RNN bug fix. (#15492) 2021-05-18 11:51:05 +02:00
Sven Mika
839fc59224
[RLlib] CQL TensorFlow support (#15841) 2021-05-18 11:10:46 +02:00
Sven Mika
d89fb82bfb
[RLlib] Add simple curriculum learning API and example script. (#15740) 2021-05-16 17:35:10 +02:00
Sven Mika
469f5227da
[RLlib] CQL bug fix: Normalize actions for atanh in BC part of the CQL loss. (#15814) 2021-05-16 15:21:06 +02:00
Sven Mika
bc09e75b78
[RLlib] Fix 3 flakey test cases. (#15785) 2021-05-16 12:20:33 +02:00
Sven Mika
c4a3e1589b
[RLlib] CQL: Bug fixes and OPE example added to test and offline_rl.py example. (#15761) 2021-05-13 09:17:23 +02:00
Sven Mika
16ddab49f5
[RLlib] Trainer._evaluate -> Trainer.evaluate; Also make evaluation possible w/o evaluation worker set. (#15591) 2021-05-12 12:16:00 +02:00
Sven Mika
a495759f06
[RLlib] Discussion 2022: PPO should auto-adjust rollout_fragment_length if other settings do not align with train_batch_size. (#15611) 2021-05-10 16:16:02 +02:00
Sven Mika
461d73ddf1
[RLlib] simple_optimizer should not be used by default for tf+MA. (#15365) 2021-05-10 16:10:44 +02:00
Sven Mika
c7563a32ed
[RLlib] DD-PPO not supported on Win (add meaningful error message). (#15631) 2021-05-04 19:26:17 +02:00
Michael Luo
4cbe13cdfd
[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603)
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-05-04 19:06:19 +02:00
Sven Mika
4b3add0066
[RLlib] Discussion 2021: PPO does not learn vf, iff use_gae=False (ignores use_critic setting). (#15610) 2021-05-04 14:17:00 +02:00
Antoine Galataud
ce1c001b1d
[RLlib] DQN: Place LearningRateSchedule mixin at the right moment (#15558) 2021-05-04 13:21:40 +02:00
Amog Kamsetty
ebc44c3d76
[CI] Upgrade flake8 to 3.9.1 (#15527)
* formatting

* format util

* format release

* format rllib/agents

* format rllib/env

* format rllib/execution

* format rllib/evaluation

* format rllib/examples

* format rllib/policy

* format rllib utils and tests

* format streaming

* more formatting

* update requirements files

* fix rllib type checking

* updates

* update

* fix circular import

* Update python/ray/tests/test_runtime_env.py

* noqa
2021-05-03 14:23:28 -07:00