Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).
CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.
If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
Lint was still failing (but only caught with doctest):
```
File "../../python/ray/rllib/utils/numpy.py", line ?, in default
Failed example:
tree.traverse(make_action_immutable, d, top_down=False)
Exception raised:
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.6/doctest.py", line 1330, in __run
compileflags, 1), test.globs)
File "<doctest default[4]>", line 1, in <module>
tree.traverse(make_action_immutable, d, top_down=False)
NameError: name 'make_action_immutable' is not defined
```
Rolling out next deprecation cycle:
- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096
Co-authored-by: sven1977 <svenmika1977@gmail.com>
Creates a zip of session_latest dir with test name and timestamp upon python test failure. Writes to dir specified by env var `RAY_TEST_FAILURE_LOGS_DIR`. Noop if env var does not exist.
Downstream consumer (e.g. CI) can upload all created artifacts in this dir. Thereby, PR submitters can more easily debug their CI failures, especially if they can't repro locally.
Limitations:
- a conftest.py file importing the main ray conftest.py needs to be present in same dir as test. This presents a challenge for e.g. dashboard tests which are highly scattered
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.
We'll need to fix the test then re-merge
Reverts #23906
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.
Details:
- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.