Kai Fricke
e1a7efe148
[tune] Use Checkpoint.to_bytes()
for store_to_object ( #25805 )
...
We currently use our own serialization to ship checkpoints as objects. Instead we should use the Checkpoint class. This PR also adds support to create results from checkpoints pointing to object references.
Depends on #26351
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-08 18:01:20 +01:00
Kai Fricke
0959f44b6f
[tune/structure] Introduce execution package ( #26015 )
...
Execution-specific packages are moved to tune.execution.
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2022-06-23 11:13:19 +01:00
Sven Mika
96693055bd
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
Artur Niederfahrenhorst
f34cd2fd8f
[RLlib] Take replay buffer api example out of GPU examples. ( #25841 )
2022-06-16 19:12:38 +02:00
Yi Cheng
7b8b0f8e03
Revert "[RLlib] Remove execution plan code no longer used by RLlib. ( #25624 )" ( #25776 )
...
This reverts commit 804719876b
.
2022-06-14 13:59:15 -07:00
Avnish Narayan
804719876b
[RLlib] Remove execution plan code no longer used by RLlib. ( #25624 )
2022-06-14 10:57:27 +02:00
Sven Mika
130b7eeaba
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
Sven Mika
7c39aa5fac
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. ( #25076 )
2022-06-10 17:09:18 +02:00
Artur Niederfahrenhorst
94d6c212df
[RLlib] Replay Buffer API documentation. ( #24683 )
2022-06-10 16:47:51 +02:00
Kai Fricke
8affbc7be6
[tune/train] Consolidate checkpoint manager 3: Ray Tune ( #24430 )
...
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.
1. Part 1 adds the common checkpoint management class #24771
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.
Old PR description:
This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.
- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 12:05:34 +01:00
Rohan Potdar
a9d8da0100
[RLlib]: Doubly Robust Off-Policy Evaluation. ( #25056 )
2022-06-07 12:52:19 +02:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. ( #25288 )
2022-06-06 13:07:47 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms
directory. ( #25366 )
2022-06-04 07:35:24 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )" ( #25420 )
...
This reverts commit e4ceae19ef
.
Reverts #25346
linux://python/ray/tests:test_client_library_integration never fail before this PR.
In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128 ). So high likely it's because of this PR.
And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b )
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )
2022-06-02 16:47:05 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule ( #25272 )
2022-06-01 11:27:54 -07:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). ( #24967 )
2022-05-28 10:50:03 +02:00
Sven Mika
163fa81976
[RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. ( #24649 )
2022-05-27 14:56:24 +02:00
Rohan Potdar
ab81c8e9ca
[RLlib]: Rename input_evaluation
to off_policy_estimation_methods
. ( #25107 )
2022-05-27 13:14:54 +02:00
Jun Gong
eaf9c941ae
[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. ( #25117 )
2022-05-25 14:38:03 +02:00
Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. ( #24923 )
2022-05-24 14:39:43 +02:00
Sven Mika
09886d7ab8
[RLlib] Upgrade gym 0.23 ( #24171 )
2022-05-23 08:18:44 +02:00
Steven Morad
501d932449
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects ( #25059 )
2022-05-22 19:58:47 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits ( #24896 )
2022-05-19 18:30:42 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. ( #24507 )
2022-05-18 09:58:25 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. ( #24746 )
2022-05-17 17:16:08 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. ( #24797 )
2022-05-16 09:45:32 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master ( #24794 )
2022-05-13 15:05:05 -07:00
Sven Mika
8fe3fd8f7b
[RLlib] QMix TrainerConfig objects. ( #24775 )
2022-05-13 18:50:28 +02:00
kourosh hakhamaneshi
ffcbb30552
[RLlib] Move from agents
to algorithms
- CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. ( #24739 )
2022-05-13 18:43:36 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. ( #24339 )
2022-05-12 16:12:42 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. ( #24156 )
2022-05-09 14:33:02 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration
method. ( #24502 )
2022-05-06 12:35:21 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan
API by default. ( #24165 )
2022-05-03 07:59:26 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00
Xuehai Pan
3c3dd5051f
[RLlib] Fix type hints for original_batches
in callbacks. ( #24214 )
2022-04-29 10:33:53 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. ( #24164 )
2022-04-27 14:24:20 +02:00
Kai Fricke
c0ec20dc3a
[tune] Next deprecation cycle ( #24076 )
...
Rolling out next deprecation cycle:
- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
2022-04-26 09:30:15 +01:00
Grzegorz Rypeść
dfb9689701
[RLlib] Issue 21489: Unity3D env lacks group rewards ( #24016 ).
2022-04-21 18:49:52 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration
method implementation (_disable_execution_plan_api=True
) ( #23735 )
2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True ( #23916 )
2022-04-15 13:51:12 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. ( #15412 )
2022-04-12 07:50:09 +02:00
Jun Gong
500cf7dcef
[RLlib] Run test_policy_client_server_setup.sh tests on different ports. ( #23787 )
2022-04-11 22:07:07 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. ( #23673 )
2022-04-11 08:39:10 +02:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib ( #23817 )
2022-04-10 16:12:53 -07:00
Sven Mika
0b3a79ca41
[RLlib] Issue 23639: Error in client/server setup when using LSTMs ( #23740 )
2022-04-07 10:16:22 +02:00
Sven Mika
434265edd0
[RLlib] Examples folder: All training_iteration
translations. ( #23712 )
2022-04-05 16:33:50 +02:00
mesjou
e725472b5b
[RLlib] Fix bug in prisoners dillemma example. ( #23690 )
2022-04-05 08:36:20 +02:00
simonsays1980
e4c6e9c3d3
[RLlib] Changed the if-block in the example callback to become more readable. ( #22900 )
2022-03-31 09:13:04 +02:00