Creates a zip of session_latest dir with test name and timestamp upon python test failure. Writes to dir specified by env var `RAY_TEST_FAILURE_LOGS_DIR`. Noop if env var does not exist.
Downstream consumer (e.g. CI) can upload all created artifacts in this dir. Thereby, PR submitters can more easily debug their CI failures, especially if they can't repro locally.
Limitations:
- a conftest.py file importing the main ray conftest.py needs to be present in same dir as test. This presents a challenge for e.g. dashboard tests which are highly scattered
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.
Details:
- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
In a [recent review](https://discuss.python.org/t/experience-with-python-3-11-in-fedora/12911) of the experience of the Fedora team porting packages to the upcoming python 3.11, they remarked that most of the work was in removing deprecated aliases in unittest. I came across a few of these when looking at unrelated test failures, the DeprecationWarnings caught my eye. So a made a quick sweep of the code, using `git grep` to find occurances of the deprecated aliases:
old | new
---|---
assertEquals | assertEqual
assertNotEquals | assertNotEqual
assertRaisesRegexp | assertRaisesRegex
Resubmit the PR https://github.com/ray-project/ray/pull/19936
I've figure out that the test case `//rllib:tests/test_gpus::test_gpus_in_local_mode` failed due to deadlock in local mode.
In local mode, if the user code submits another task during the executing of current task, the `CoreWorker::actor_task_mutex_` may cause deadlock.
The solution is quite simple, release the lock before executing task in local mode.
In the commit 7c2f61c76c:
1. Release the lock in local mode to fix the bug. @scv119
2. `test_local_mode_deadlock` added to cover the case. @rkooo567
3. Left a trivial change in `rllib/tests/test_gpus.py` to make the `RAY_CI_RLLIB_DIRECTLY_AFFECTED ` to take effect.
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787cdd9.
Co-authored-by: sven1977 <svenmika1977@gmail.com>