Commit graph

1268 commits

Author SHA1 Message Date
Artur Niederfahrenhorst
f34cd2fd8f
[RLlib] Take replay buffer api example out of GPU examples. (#25841) 2022-06-16 19:12:38 +02:00
Yi Cheng
7b8b0f8e03
Revert "[RLlib] Remove execution plan code no longer used by RLlib. (#25624)" (#25776)
This reverts commit 804719876b.
2022-06-14 13:59:15 -07:00
Jun Gong
c026374acb
[RLlib] Fix the 2 failing RLlib release tests. (#25603) 2022-06-14 14:51:08 +02:00
Kai Fricke
6313ddc47c
[tune] Refactor Syncer / deprecate Sync client (#25655)
This PR includes / depends on #25709

The two concepts of Syncer and SyncClient are confusing, as is the current API for passing custom sync functions.

This PR refactors Tune's syncing behavior. The Sync client concept is hard deprecated. Instead, we offer a well defined Syncer API that can be extended to provide own syncing functionality. However, the default will be to use Ray AIRs file transfer utilities.

New API:
- Users can pass `syncer=CustomSyncer` which implements the `Syncer` API
- Otherwise our off-the-shelf syncing is used
- As before, syncing to cloud disables syncing to driver

Changes:
- Sync client is removed
- Syncer interface introduced
- _DefaultSyncer is a wrapper around the URI upload/download API from Ray AIR
- SyncerCallback only uses remote tasks to synchronize data
- Rsync syncing is fully depracated and removed
- Docker and kubernetes-specific syncing is fully deprecated and removed
- Testing is improved to use `file://` URIs instead of mock sync clients
2022-06-14 14:46:30 +02:00
kourosh hakhamaneshi
f597e21ac8
[RLlib] Fix sample batch concat samples. (#25572) 2022-06-14 12:47:29 +02:00
kourosh hakhamaneshi
25940cb95b
[RLlib] CRR documentation. (#25667) 2022-06-14 12:45:36 +02:00
Avnish Narayan
804719876b
[RLlib] Remove execution plan code no longer used by RLlib. (#25624) 2022-06-14 10:57:27 +02:00
Kai Fricke
736c7b13c4
[CI] Fix team to rllib (from ml) for some replay buffer API tests. (#25702) 2022-06-11 18:05:16 +02:00
Sven Mika
130b7eeaba
[RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
Sven Mika
7c39aa5fac
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076) 2022-06-10 17:09:18 +02:00
Artur Niederfahrenhorst
94d6c212df
[RLlib] Replay Buffer API documentation. (#24683) 2022-06-10 16:47:51 +02:00
Artur Niederfahrenhorst
c3645928ca
[RLlib] Fix no gradient clipping happening in QMix. (#25656) 2022-06-10 13:51:26 +02:00
Avnish Narayan
730df43656
[RLlib] Issue 25503: Replace torch.range with torch.arange. (#25640) 2022-06-10 13:21:54 +02:00
kourosh hakhamaneshi
b3a351925d
[RLlib] Added meaningful error for multi-agent failure of SampleCollector in case no agent steps in episode. (#25596) 2022-06-10 12:30:43 +02:00
Artur Niederfahrenhorst
8af9ef8fee
[RLlib] Discussion 6432: Automatic train_batch_size calculation fix. (#25621) 2022-06-10 12:15:57 +02:00
Artur Niederfahrenhorst
7495e9c89c
[RLlib] Dreamer Policy sub-classing schema. (#25585) 2022-06-09 17:14:15 +02:00
Kai Fricke
aa142eb377
[RLlib; CI] Add team:rllib tag for Bazel. (#25589)
Currently, team:ml spans all ML (Tune, Train, AIR) tests and rllib tests. rllib tests are much more flaky and it would be good to split them up in the flaky test tracker. This PR changes Rllib-tests from team:ml to team:rllib to enable this separation.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 22:25:59 +01:00
Artur Niederfahrenhorst
9226643433
[RLlib] Issue 4965: Fixes PyTorch grad clipping logic and adds grad clipping to QMIX. (#25584) 2022-06-08 19:40:57 +02:00
Sven Mika
388fb98c79
[RLlib] CRR Tests fixes. (#25586) 2022-06-08 19:18:55 +02:00
Kai Fricke
8affbc7be6
[tune/train] Consolidate checkpoint manager 3: Ray Tune (#24430)
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.

1. Part 1 adds the common checkpoint management class #24771 
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.

Old PR description:

This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.

- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 12:05:34 +01:00
kourosh hakhamaneshi
4cdd508f70
[RLlib] Added CRR implementation. (#25499) 2022-06-08 11:42:02 +02:00
Jun Gong
9b65d5535d
[RLlib] Introduce basic connectors library. (#25311) 2022-06-07 19:18:14 +02:00
Rohan Potdar
a9d8da0100
[RLlib]: Doubly Robust Off-Policy Evaluation. (#25056) 2022-06-07 12:52:19 +02:00
Artur Niederfahrenhorst
429d0f0eee
[RLlib] Fix multi agent environment checks for observations that contain only some agents' obs each step. (#25506) 2022-06-07 10:33:35 +02:00
Artur Niederfahrenhorst
35bd397181
[RLlib] Better default values for training_intensity and target_network_update_freq for R2D2. (#25510) 2022-06-07 10:29:56 +02:00
Vince Jankovics
68444cd390
[tune] Custom resources per worker added to default_resource_request (#24463)
This resolves the `TODO(ekl): add custom resources here once tune supports them` item. 
Also, related to the discussion [here](https://discuss.ray.io/t/reserve-workers-on-gpu-node-for-trainer-workers-only/5972/5).

Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-06-06 22:41:02 +01:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. (#25288) 2022-06-06 13:07:47 +02:00
Artur Niederfahrenhorst
243038d00a
[RLlib] Issue 25401: Faulty usage of get_filter_config in ComplexInputNetworks (#25493) 2022-06-06 13:04:17 +02:00
kourosh hakhamaneshi
d49d0efbaf
[RLlib] Bug fix: when on GPU, sample_batch.to_device() only converts the device and does not convert float64 to float32. (#25460) 2022-06-06 12:43:11 +02:00
Artur Niederfahrenhorst
c4a0e9d0f2
[RLlib] Disambiguate timestep fragment storage unit in replay buffers. (#25242) 2022-06-06 11:35:49 +02:00
Jun Gong
644b80c0ef
[RLlib] mark learning and examples tests exclusive. (#25445) 2022-06-04 09:35:24 -07:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
Sven Mika
6c7f781d8e
[RLlib] Unflake some CI-tests. (#25313) 2022-06-03 14:51:50 +02:00
Jun Gong
1d24d6af98
[RLlib] Fix MARWIL tf policy. (#25384) 2022-06-03 10:50:36 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346) 2022-06-02 16:47:05 +02:00
Steven Morad
f781622f86
[RLlib] Bandits (torch) Policy sub-class. (#25254)
Co-authored-by: Steven Morad <smorad@anyscale.com>
2022-06-02 15:16:51 +02:00
Antoni Baum
045c47f172
[CI] Check test files for if __name__... snippet (#25322)
Bazel operates by simply running the python scripts given to it in `py_test`. If the script doesn't invoke pytest on itself in the `if _name__ == "__main__"` snippet, no tests will be ran, and the script will pass. This has led to several tests (indeed, some are fixed in this PR) that, despite having been written, have never ran in CI. This PR adds a lint check to check all `py_test` sources for the presence of `if _name__ == "__main__"` snippet, and will fail CI if there are any detected without it. This system is only enabled for libraries right now (tune, train, air, rllib), but it could be trivially extended to other modules if approved.
2022-06-02 10:30:00 +01:00
Artur Niederfahrenhorst
71a8a443ce
[RLlib] Fix Policy global timesteps being off by init sample batch size. (#25349) 2022-06-02 10:19:21 +02:00
kourosh hakhamaneshi
87c9fdd0f8
RLlib: Fix bug: WorkerSet.stop() will raise error if self._local_worker is None (e.g. in evaluation worker sets). (#25332) 2022-06-02 09:41:43 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00
Sven Mika
18c03f8d93
[RLlib] A2C + A3C move to algorithms folder and re-name into A2C/A3C (from ...Trainer). (#25314) 2022-06-01 09:29:16 +02:00
Sven Mika
94557e3095
[RLlib] Apex-DDPG TrainerConfig objects. (#25279) 2022-05-30 19:45:38 +02:00
Sven Mika
c5edd82c63
[RLlib] MB-MPO TrainerConfig objects. (#25278) 2022-05-30 17:33:01 +02:00
Sven Mika
f75ede1b81
[RLlib] MA-DDPG TrainerConfig objects. (#25255) 2022-05-30 15:38:24 +02:00
Sven Mika
30f6fc340b
[RLlib] AlphaZero TrainerConfig objects. (#25256) 2022-05-30 15:37:58 +02:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). (#24967) 2022-05-28 10:50:03 +02:00
Sven Mika
ab6c3027e5
[RLlib] A2/3C policy sub-classing schema. (#25078) 2022-05-28 09:54:47 +02:00
Sven Mika
163fa81976
[RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. (#24649) 2022-05-27 14:56:24 +02:00
Rohan Potdar
ab81c8e9ca
[RLlib]: Rename input_evaluation to off_policy_estimation_methods. (#25107) 2022-05-27 13:14:54 +02:00