Commit graph

1117 commits

Author SHA1 Message Date
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. (#24118)
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
xwjiang2010
d7da0d706e
[rllib] Only conditionally import JaxCategorical in catalog.py (#24086)
* Experiment with less imports in catalog.py

* lint
2022-04-22 14:51:35 -07:00
Avnish Narayan
3bf907bcf8
[RLlib] Don't modify environments via the env checker utilities. (#24083) 2022-04-22 18:39:47 +02:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035)" (#24103)
This reverts commit a337fd994e.
2022-04-22 09:58:58 +01:00
jon-chuang
e6a458a31e
[CI] Create zip of ray session_latest/logs dir on test failure and upload to buildkite via /artifact-mount (#23783)
Creates a zip of session_latest dir with test name and timestamp upon python test failure. Writes to dir specified by env var `RAY_TEST_FAILURE_LOGS_DIR`. Noop if env var does not exist.

Downstream consumer (e.g. CI) can upload all created artifacts in this dir. Thereby, PR submitters can more easily debug their CI failures, especially if they can't repro locally.

Limitations:
- a conftest.py file importing the main ray conftest.py needs to be present in same dir as test. This presents a challenge for e.g. dashboard tests which are highly scattered
2022-04-22 09:48:53 +01:00
Grzegorz Rypeść
dfb9689701
[RLlib] Issue 21489: Unity3D env lacks group rewards (#24016). 2022-04-21 18:49:52 +02:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035) 2022-04-21 17:37:49 +02:00
Sven Mika
14dd7aac13
[RLlib] Issue 22943: PettingZoo parallel should not use env checking (for now). (#24025) 2022-04-21 11:20:54 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn (#23906)" (#24030)
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.

We'll need to fix the test then re-merge

Reverts #23906
2022-04-19 16:43:57 -07:00
Avnish Narayan
55f6896142
[RLlib] Issue 24014: Change occurrences of randint to integers in RLlib (#24019) 2022-04-19 22:15:14 +02:00
Sven Mika
9de391b70e
[RLlib] Issue 23897: add_time_dimension() causes returned shape to be completely unknown. (#24006) 2022-04-19 17:56:56 +02:00
Sven Mika
de9e143938
[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005) 2022-04-19 17:55:59 +02:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn (#23906) 2022-04-19 17:55:26 +02:00
Jun Gong
d3c69ebdb6
[RLlib] Make sure unsquash_action moves user action to proper range (#23941) 2022-04-18 18:55:57 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420) 2022-04-18 12:20:12 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) (#23735) 2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True (#23916) 2022-04-15 13:51:12 +02:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
Kinal Mehta
758e758c32
[rllib] Fix incorrect sequence length for rnn (#23830)
Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.
2022-04-12 21:07:18 +01:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Jun Gong
500cf7dcef
[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787) 2022-04-11 22:07:07 +02:00
Jun Gong
c61910487f
[RLlib] Fix typo in docstring of PGTorchPolicy (#23818) 2022-04-11 19:31:45 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. (#23798) 2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673) 2022-04-11 08:39:10 +02:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817) 2022-04-10 16:12:53 -07:00
Kai Fricke
8c2e471265
[AIR] Add RLTrainer interface, implementation, and examples (#23465)
This PR adds a RLTrainer to Ray AIR. It works for both offline and online use cases. In offline training, it will leverage the datasets key of the Trainer API to specify a dataset reader input, used e.g. in Behavioral Cloning (BC). In online training, it is a wrapper around the rllib trainables making use of the parameter layering enabled by the Trainer API.
2022-04-08 17:16:42 -07:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739) 2022-04-08 15:33:28 +02:00
Sven Mika
4d285a00a4
[RLlib] Issue 23689: tf Initializer has hard-coded float32 dtype. (#23741) 2022-04-07 21:35:02 +02:00
Artur Niederfahrenhorst
02a50f02b7
[RLlib] RepayBuffer: _hit_counts working again. (#23586) 2022-04-07 10:56:25 +02:00
Sven Mika
0b3a79ca41
[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740) 2022-04-07 10:16:22 +02:00
Sven Mika
e391b624f0
[RLlib] Re-enable (for CI-testing) our two self_play example scripts. (#23742) 2022-04-07 08:20:48 +02:00
Sven Mika
434265edd0
[RLlib] Examples folder: All training_iteration translations. (#23712) 2022-04-05 16:33:50 +02:00
Steven Morad
39841b65b3
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function (#23671) 2022-04-05 08:40:29 +02:00
mesjou
e725472b5b
[RLlib] Fix bug in prisoners dillemma example. (#23690) 2022-04-05 08:36:20 +02:00
Jiajun Yao
5f37231842
Remove yapf dependency (#23656)
Yapf has been replaced by black.
2022-04-04 21:50:04 -07:00
Sven Mika
0bb82f29b6
[RLlib] AlphaStar polishing (fix logger.info bug). (#22281) 2022-04-01 09:49:41 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491) 2022-03-31 18:26:12 +02:00
simonsays1980
9ca9c67bc9
[RLlib] Added dtype safeguards to the 'required_model_output_shape()' methods… (#23490) 2022-03-31 13:52:00 +02:00
simonsays1980
e4c6e9c3d3
[RLlib] Changed the if-block in the example callback to become more readable. (#22900) 2022-03-31 09:13:04 +02:00
simonsays1980
d2a3948845
[RLlib] Removed the sampler() function in the ParallelRollouts() as it is no needed. (#22320) 2022-03-31 09:06:30 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842) 2022-03-29 14:44:40 +02:00
Jun Gong
a7e5aa8c6a
[RLlib] Delete some unused confusing logics. (#23513) 2022-03-29 13:45:13 +02:00
Artur Niederfahrenhorst
32ad6c6ef1
[RLlib] Replay Buffer capacity check (#23523) 2022-03-29 12:06:27 +02:00
Kai Fricke
262d6121bb
[rllib] Fix error messages and example for dataset writer (#23419)
Currently the error message and example refer to a field type that is actually format.
2022-03-28 19:53:12 +01:00
Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488) 2022-03-25 18:25:51 +01:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI (#23418) 2022-03-24 17:04:02 -07:00
Sven Mika
22c9c4aa39
[RLlib] Slate-Q +GPU torch bug fix. (#23464) 2022-03-24 17:39:33 +01:00
Avnish Narayan
5134e0dc12
[RLlib] Change type to tensortype for cql policies. (#23438) 2022-03-24 12:32:29 +01:00
Fabian Witter
2547055f38
[RLlib] Add support for complex observations in CQL (#23332) 2022-03-22 17:04:07 +01:00