Commit graph

1142 commits

Author SHA1 Message Date
Kai Fricke
242706922b
[rllib] Fix linting (#24335)
#24262 broke linting. This fixes this.
2022-04-29 15:21:11 +01:00
Jun Gong
ec636dcb29
[RLlib] Do not print warning message during env pre-checking, if there is nothing wrong with user envs. (#24289) 2022-04-29 10:41:19 +02:00
Xuehai Pan
377a522ce2
[RLlib] Fix time dimension shaping for PyTorch RNN models. (#21735) 2022-04-29 10:39:03 +02:00
Pavel C
de0c6f6132
[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. (#22010) 2022-04-29 10:38:05 +02:00
Ishant Mrinal
0248c60387
[RLlib] Add additional return values to action_sampler_fn. (#22721) 2022-04-29 10:34:48 +02:00
Xuehai Pan
3c3dd5051f
[RLlib] Fix type hints for original_batches in callbacks. (#24214) 2022-04-29 10:33:53 +02:00
Xuehai Pan
9c76e21a5e
[RLlib] Ensure MultiCallbacks always implements all callback methods (#24254) 2022-04-29 10:30:24 +02:00
simonsays1980
ff575eeafc
[RLlib] Make actions sent by RLlib to the env immutable. (#24262) 2022-04-29 10:27:06 +02:00
HJasperson
5f12c62226
[RLlib] Fix "tf variable is unhashable" Error. (#24273) 2022-04-29 10:07:02 +02:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). (#24295) 2022-04-28 22:25:16 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager() or with_tracing more than once. (#24271) 2022-04-28 13:43:21 +02:00
Sven Mika
c95dd79953
[RLlib] APPO eager fix (APPOTFPolicy gets wrapped as_eager() twice by mistake). (#24268) 2022-04-27 21:27:34 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. (#24164) 2022-04-27 14:24:20 +02:00
Sven Mika
29388fb25b
[RLlib] Reinstate flakey AlphaStar learning CI test (flakey due to 2 changed, bad config default values). (#24256) 2022-04-27 14:01:52 +02:00
Noon van der Silk
38a028de2d
[RLlib] Don't add elements to _agent_ids during env pre-checking. (#24136) 2022-04-26 15:55:15 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. (#24166) 2022-04-26 14:28:39 +02:00
Artur Niederfahrenhorst
f7be409462
[RLlib] Training Iteration Function for SAC (#24157) 2022-04-26 12:37:54 +02:00
Kai Fricke
c0ec20dc3a
[tune] Next deprecation cycle (#24076)
Rolling out next deprecation cycle:

- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
2022-04-26 09:30:15 +01:00
Xuehai Pan
6087eda91b
[RLlib] Issue 21991: Fix SampleBatch slicing for SampleBatch.INFOS in RNN cases (#22050) 2022-04-25 11:40:24 +02:00
Noon van der Silk
3589c21924
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148) 2022-04-25 11:25:28 +02:00
Fabian Witter
56bc90ca72
[RLlib] Remove Unnecessary List Conversion of Complex Observations in SAC Models (torch and tf). (#24106) 2022-04-25 11:21:34 +02:00
Jeroen Bédorf
1263015931
[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041) 2022-04-25 11:17:50 +02:00
Artur Niederfahrenhorst
306853b5b8
[RLlib] Issue 22693: RNN-SAC fixes. (#23814) 2022-04-25 09:19:24 +02:00
Ben Kasper
531fdd50d4
[RLlib] Add 2 missing callbacks to MultiCallbacks class (on_trainer_init and on_sub_environment_created) (#24153) 2022-04-25 09:18:03 +02:00
Kai Fricke
d161831f0e
[RLlib; testing] Deactivate flaky alpha star learning test (#24138) 2022-04-23 17:45:58 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. (#24118)
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
xwjiang2010
d7da0d706e
[rllib] Only conditionally import JaxCategorical in catalog.py (#24086)
* Experiment with less imports in catalog.py

* lint
2022-04-22 14:51:35 -07:00
Avnish Narayan
3bf907bcf8
[RLlib] Don't modify environments via the env checker utilities. (#24083) 2022-04-22 18:39:47 +02:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035)" (#24103)
This reverts commit a337fd994e.
2022-04-22 09:58:58 +01:00
jon-chuang
e6a458a31e
[CI] Create zip of ray session_latest/logs dir on test failure and upload to buildkite via /artifact-mount (#23783)
Creates a zip of session_latest dir with test name and timestamp upon python test failure. Writes to dir specified by env var `RAY_TEST_FAILURE_LOGS_DIR`. Noop if env var does not exist.

Downstream consumer (e.g. CI) can upload all created artifacts in this dir. Thereby, PR submitters can more easily debug their CI failures, especially if they can't repro locally.

Limitations:
- a conftest.py file importing the main ray conftest.py needs to be present in same dir as test. This presents a challenge for e.g. dashboard tests which are highly scattered
2022-04-22 09:48:53 +01:00
Grzegorz Rypeść
dfb9689701
[RLlib] Issue 21489: Unity3D env lacks group rewards (#24016). 2022-04-21 18:49:52 +02:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035) 2022-04-21 17:37:49 +02:00
Sven Mika
14dd7aac13
[RLlib] Issue 22943: PettingZoo parallel should not use env checking (for now). (#24025) 2022-04-21 11:20:54 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn (#23906)" (#24030)
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.

We'll need to fix the test then re-merge

Reverts #23906
2022-04-19 16:43:57 -07:00
Avnish Narayan
55f6896142
[RLlib] Issue 24014: Change occurrences of randint to integers in RLlib (#24019) 2022-04-19 22:15:14 +02:00
Sven Mika
9de391b70e
[RLlib] Issue 23897: add_time_dimension() causes returned shape to be completely unknown. (#24006) 2022-04-19 17:56:56 +02:00
Sven Mika
de9e143938
[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005) 2022-04-19 17:55:59 +02:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn (#23906) 2022-04-19 17:55:26 +02:00
Jun Gong
d3c69ebdb6
[RLlib] Make sure unsquash_action moves user action to proper range (#23941) 2022-04-18 18:55:57 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420) 2022-04-18 12:20:12 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) (#23735) 2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True (#23916) 2022-04-15 13:51:12 +02:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
Kinal Mehta
758e758c32
[rllib] Fix incorrect sequence length for rnn (#23830)
Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.
2022-04-12 21:07:18 +01:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Jun Gong
500cf7dcef
[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787) 2022-04-11 22:07:07 +02:00
Jun Gong
c61910487f
[RLlib] Fix typo in docstring of PGTorchPolicy (#23818) 2022-04-11 19:31:45 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. (#23798) 2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673) 2022-04-11 08:39:10 +02:00