Commit graph

1164 commits

Author SHA1 Message Date
Sven Mika
f54557073e
[RLlib] Remove execution_plan API code no longer needed. (#24501) 2022-05-06 12:29:53 +02:00
Sven Mika
f891a2b6f1
[RLlib] SlateQ + tf; release test fixes, related to TD-error not properly being formatted. (#24521) 2022-05-06 08:50:30 +02:00
Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn (#23454) 2022-05-05 16:11:08 +02:00
Christy Bergman
76eb47e226
[RLlib; docs] Rename UCB -> LinUCB. (#24348) 2022-05-05 10:20:16 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00
Sven Mika
5b61a00792
[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). (#24433) 2022-05-04 16:28:12 +02:00
Sven Mika
b48f63113b
[RLlib] SlateQ fixes: Release learning tests wrong yaml structure + TD-error torch issue (#24429) 2022-05-04 13:37:14 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. (#24165) 2022-05-03 07:59:26 +02:00
Sven Mika
7cca7782f1
[RLlib] OPE (off policy estimator) API. (#24384) 2022-05-02 21:15:50 +02:00
Sven Mika
0c5ac3b9e8
[RLlib] Issue 24075: Better error message for Bandit MultiDiscrete (suggest using our wrapper). (#24385) 2022-05-02 21:14:08 +02:00
Sven Mika
296e2ebc46
[RLlib] Issue 24082: WorkerSet.policies_to_train (deprecated) - if still used - returns wrong values. (#24386) 2022-05-02 18:33:52 +02:00
Sven Mika
924adcf402
[RLlib] Issue 24074: multi-GPU learner thread key error in MA-scenarios. (#24382) 2022-05-02 18:30:46 +02:00
Sven Mika
f53ca1cacb
[RLlib] ES + ARS TrainerConfig objects. (#24374) 2022-05-02 16:55:28 +02:00
Edward Oakes
11954e6798
Issue 24143: Fix a few f-strings missing the f. (#24232) 2022-05-02 16:11:33 +02:00
Sven Mika
026849cd27
[RLlib] APPO TrainerConfig objects. (#24376) 2022-05-02 15:06:23 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. (#24372) 2022-05-02 12:51:14 +02:00
Sven Mika
950bd3fc3f
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00
Jiajun Yao
cfc192ebc4
Collect library usage (#24312)
Collect which libraries are used for usage stats purpose.
2022-04-30 07:51:01 -07:00
Sven Mika
b2b1c95aa5
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). (#24332) 2022-04-30 09:51:09 +02:00
Sven Mika
3052193c9e
[RLlib] Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). (#24345)
Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).

CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.

If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
2022-04-29 21:02:34 +01:00
Kai Fricke
7a4d58d80f
[rllib] Fix doctest failure (#24343)
Lint was still failing (but only caught with doctest):

```
File "../../python/ray/rllib/utils/numpy.py", line ?, in default

Failed example:

    tree.traverse(make_action_immutable, d, top_down=False)

Exception raised:

    Traceback (most recent call last):

      File "/opt/miniconda/lib/python3.6/doctest.py", line 1330, in __run

        compileflags, 1), test.globs)

      File "<doctest default[4]>", line 1, in <module>

        tree.traverse(make_action_immutable, d, top_down=False)

    NameError: name 'make_action_immutable' is not defined

```
2022-04-29 19:13:24 +01:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. (#24151) 2022-04-29 18:38:17 +02:00
Kai Fricke
242706922b
[rllib] Fix linting (#24335)
#24262 broke linting. This fixes this.
2022-04-29 15:21:11 +01:00
Jun Gong
ec636dcb29
[RLlib] Do not print warning message during env pre-checking, if there is nothing wrong with user envs. (#24289) 2022-04-29 10:41:19 +02:00
Xuehai Pan
377a522ce2
[RLlib] Fix time dimension shaping for PyTorch RNN models. (#21735) 2022-04-29 10:39:03 +02:00
Pavel C
de0c6f6132
[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. (#22010) 2022-04-29 10:38:05 +02:00
Ishant Mrinal
0248c60387
[RLlib] Add additional return values to action_sampler_fn. (#22721) 2022-04-29 10:34:48 +02:00
Xuehai Pan
3c3dd5051f
[RLlib] Fix type hints for original_batches in callbacks. (#24214) 2022-04-29 10:33:53 +02:00
Xuehai Pan
9c76e21a5e
[RLlib] Ensure MultiCallbacks always implements all callback methods (#24254) 2022-04-29 10:30:24 +02:00
simonsays1980
ff575eeafc
[RLlib] Make actions sent by RLlib to the env immutable. (#24262) 2022-04-29 10:27:06 +02:00
HJasperson
5f12c62226
[RLlib] Fix "tf variable is unhashable" Error. (#24273) 2022-04-29 10:07:02 +02:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). (#24295) 2022-04-28 22:25:16 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager() or with_tracing more than once. (#24271) 2022-04-28 13:43:21 +02:00
Sven Mika
c95dd79953
[RLlib] APPO eager fix (APPOTFPolicy gets wrapped as_eager() twice by mistake). (#24268) 2022-04-27 21:27:34 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. (#24164) 2022-04-27 14:24:20 +02:00
Sven Mika
29388fb25b
[RLlib] Reinstate flakey AlphaStar learning CI test (flakey due to 2 changed, bad config default values). (#24256) 2022-04-27 14:01:52 +02:00
Noon van der Silk
38a028de2d
[RLlib] Don't add elements to _agent_ids during env pre-checking. (#24136) 2022-04-26 15:55:15 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. (#24166) 2022-04-26 14:28:39 +02:00
Artur Niederfahrenhorst
f7be409462
[RLlib] Training Iteration Function for SAC (#24157) 2022-04-26 12:37:54 +02:00
Kai Fricke
c0ec20dc3a
[tune] Next deprecation cycle (#24076)
Rolling out next deprecation cycle:

- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
2022-04-26 09:30:15 +01:00
Xuehai Pan
6087eda91b
[RLlib] Issue 21991: Fix SampleBatch slicing for SampleBatch.INFOS in RNN cases (#22050) 2022-04-25 11:40:24 +02:00
Noon van der Silk
3589c21924
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148) 2022-04-25 11:25:28 +02:00
Fabian Witter
56bc90ca72
[RLlib] Remove Unnecessary List Conversion of Complex Observations in SAC Models (torch and tf). (#24106) 2022-04-25 11:21:34 +02:00
Jeroen Bédorf
1263015931
[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041) 2022-04-25 11:17:50 +02:00
Artur Niederfahrenhorst
306853b5b8
[RLlib] Issue 22693: RNN-SAC fixes. (#23814) 2022-04-25 09:19:24 +02:00
Ben Kasper
531fdd50d4
[RLlib] Add 2 missing callbacks to MultiCallbacks class (on_trainer_init and on_sub_environment_created) (#24153) 2022-04-25 09:18:03 +02:00
Kai Fricke
d161831f0e
[RLlib; testing] Deactivate flaky alpha star learning test (#24138) 2022-04-23 17:45:58 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. (#24118)
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
xwjiang2010
d7da0d706e
[rllib] Only conditionally import JaxCategorical in catalog.py (#24086)
* Experiment with less imports in catalog.py

* lint
2022-04-22 14:51:35 -07:00
Avnish Narayan
3bf907bcf8
[RLlib] Don't modify environments via the env checker utilities. (#24083) 2022-04-22 18:39:47 +02:00