Jun Gong
a61095a480
[RLlib] fix bandit pre-merge tests ( #27554 )
2022-08-07 17:48:29 -07:00
Jun Gong
5f07987ab1
[RLlib] Fix connector examples ( #27583 )
2022-08-07 17:48:09 -07:00
Rohan Potdar
5b6a58ed28
[RLlib] Add OPE Learning Tests ( #27154 )
2022-08-02 17:51:38 -07:00
Kai Fricke
1d3c167bfe
[rllib/release] Fix rllib connect test with Tuner() API ( #27155 )
...
Currently failing because the Tune framework example does not return fitting results.
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-28 11:08:02 +01:00
Malinda
1d789aee63
[RLlib/Serve/Release tests] Few code refactoring for better use of efficient NumPy functions. ( #26284 )
2022-07-27 22:38:35 +02:00
Jun Gong
acf2bf9b2f
[RLlib] Get rid of all these deprecation warnings. ( #27085 )
2022-07-27 10:48:54 -07:00
xwjiang2010
fcf897ee72
[air] update rllib example to use Tuner API. ( #26987 )
...
update rllib example to use Tuner API.
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
2022-07-27 12:12:59 +01:00
Jun Gong
c7ae787cc8
[RLlib] Beef up worker failure test. ( #26953 )
2022-07-27 00:10:45 -07:00
Artur Niederfahrenhorst
b1594260ba
[RLlib] Small SlateQ example fix. ( #26948 )
2022-07-25 15:12:42 +02:00
Jun Gong
6b6d3017ba
[RLlib] more connector polishes and fixes. ( #26645 )
2022-07-19 08:50:28 -07:00
Sven Mika
4aea24c8a8
[RLlib] restart_failed_sub_environments
now works for MA cases and crashes during reset()
; +more tests and logging; add eval worker sub-env fault tolerance test. ( #26276 )
2022-07-15 08:55:14 +02:00
Jun Gong
104407a6e5
[RLlib] Fix all the erroneous on_trainer_init
warning. ( #26433 )
2022-07-13 18:56:01 +02:00
Jun Gong
b383d987d1
[RLlib] Fix a bunch of issues related to connectors. ( #26510 )
2022-07-13 18:55:20 +02:00
Jun Gong
0c469e490e
[RLlib] Checkpoint and restore connectors. ( #26253 )
2022-07-09 01:06:24 -07:00
Kai Fricke
e1a7efe148
[tune] Use Checkpoint.to_bytes()
for store_to_object ( #25805 )
...
We currently use our own serialization to ship checkpoints as objects. Instead we should use the Checkpoint class. This PR also adds support to create results from checkpoints pointing to object references.
Depends on #26351
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-08 18:01:20 +01:00
Kai Fricke
0959f44b6f
[tune/structure] Introduce execution package ( #26015 )
...
Execution-specific packages are moved to tune.execution.
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2022-06-23 11:13:19 +01:00
Sven Mika
96693055bd
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
Artur Niederfahrenhorst
f34cd2fd8f
[RLlib] Take replay buffer api example out of GPU examples. ( #25841 )
2022-06-16 19:12:38 +02:00
Yi Cheng
7b8b0f8e03
Revert "[RLlib] Remove execution plan code no longer used by RLlib. ( #25624 )" ( #25776 )
...
This reverts commit 804719876b
.
2022-06-14 13:59:15 -07:00
Avnish Narayan
804719876b
[RLlib] Remove execution plan code no longer used by RLlib. ( #25624 )
2022-06-14 10:57:27 +02:00
Sven Mika
130b7eeaba
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
Sven Mika
7c39aa5fac
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. ( #25076 )
2022-06-10 17:09:18 +02:00
Artur Niederfahrenhorst
94d6c212df
[RLlib] Replay Buffer API documentation. ( #24683 )
2022-06-10 16:47:51 +02:00
Kai Fricke
8affbc7be6
[tune/train] Consolidate checkpoint manager 3: Ray Tune ( #24430 )
...
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.
1. Part 1 adds the common checkpoint management class #24771
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.
Old PR description:
This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.
- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 12:05:34 +01:00
Rohan Potdar
a9d8da0100
[RLlib]: Doubly Robust Off-Policy Evaluation. ( #25056 )
2022-06-07 12:52:19 +02:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. ( #25288 )
2022-06-06 13:07:47 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms
directory. ( #25366 )
2022-06-04 07:35:24 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )" ( #25420 )
...
This reverts commit e4ceae19ef
.
Reverts #25346
linux://python/ray/tests:test_client_library_integration never fail before this PR.
In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128 ). So high likely it's because of this PR.
And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b )
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )
2022-06-02 16:47:05 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule ( #25272 )
2022-06-01 11:27:54 -07:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). ( #24967 )
2022-05-28 10:50:03 +02:00
Sven Mika
163fa81976
[RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. ( #24649 )
2022-05-27 14:56:24 +02:00
Rohan Potdar
ab81c8e9ca
[RLlib]: Rename input_evaluation
to off_policy_estimation_methods
. ( #25107 )
2022-05-27 13:14:54 +02:00
Jun Gong
eaf9c941ae
[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. ( #25117 )
2022-05-25 14:38:03 +02:00
Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. ( #24923 )
2022-05-24 14:39:43 +02:00
Sven Mika
09886d7ab8
[RLlib] Upgrade gym 0.23 ( #24171 )
2022-05-23 08:18:44 +02:00
Steven Morad
501d932449
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects ( #25059 )
2022-05-22 19:58:47 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits ( #24896 )
2022-05-19 18:30:42 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. ( #24507 )
2022-05-18 09:58:25 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. ( #24746 )
2022-05-17 17:16:08 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. ( #24797 )
2022-05-16 09:45:32 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master ( #24794 )
2022-05-13 15:05:05 -07:00
Sven Mika
8fe3fd8f7b
[RLlib] QMix TrainerConfig objects. ( #24775 )
2022-05-13 18:50:28 +02:00
kourosh hakhamaneshi
ffcbb30552
[RLlib] Move from agents
to algorithms
- CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. ( #24739 )
2022-05-13 18:43:36 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. ( #24339 )
2022-05-12 16:12:42 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. ( #24156 )
2022-05-09 14:33:02 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration
method. ( #24502 )
2022-05-06 12:35:21 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan
API by default. ( #24165 )
2022-05-03 07:59:26 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00