Why are these changes needed?
Currently clang-tidy does not run inside scripts/format.sh. Also clang-tidy can produce false positive warnings. Maybe we can disable clang-tidy until ergonomic issues are resolved.
Why are these changes needed?
Right now the failure signal handler registered in Python worker is skipped on crashes like segfault, because C++ core worker overrides the failure signal handler here and does not call the previously registered handler. This prevents Python stack trace from being printed on crashes. The fix is to make the C++ fault signal handler to call the previous signal handler registered in Python. For example with the script below which segfaults,
import ray
ray.init()
@ray.remote
def f():
import ctypes;
ctypes.string_at(0)
ray.get(f.remote())
Ray currently only prints the following stack trace:
(pid=26693) *** SIGSEGV received at time=1634418743 ***
(pid=26693) PC: @ 0x7fff203d9552 (unknown) _platform_strlen
(pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: *** SIGSEGV received at time=1634418743 ***
(pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: PC: @ 0x7fff203d9552 (unknown) _platform_strlen
With this change, Python stack trace will be printed in addition to the stack trace above:
(pid=26693) Fatal Python error: Segmentation fault
(pid=26693)
(pid=26693) Stack (most recent call first):
(pid=26693) File "/Users/mwtian/opt/anaconda3/envs/ray/lib/python3.7/ctypes/__init__.py", line 505 in string_at
(pid=26693) File "stack.py", line 7 in f
(pid=26693) File "/Users/mwtian/work/ray-project/ray/python/ray/worker.py", line 425 in main_loop
(pid=26693) File "/Users/mwtian/work/ray-project/ray/python/ray/workers/default_worker.py", line 212 in <module>
This should make debugging crashes in Python worker easier, for users and Ray devs.
Also, try to initialize symbolizer in GCS, Raylet and core worker. This is a no-op on MacOS and some Linux environments (e.g. Ray on Ubuntu 20.04 already produces symbolized stack traces), but should make Ray more likely to have symbolized stack traces on other platforms.
## Why are these changes needed?
There are some issues left from previous PRs.
- Put the gcs_actor_scheduler_mock_test back
- Add comment for named actor creation behavior
- Fix the comment for some flags.
## Related issue number
Why are these changes needed?
Related issue number
##19177
Quoting #19177 (comment) here,
The following tests fail when not skipped,
=================================== short test summary info ====================================
FAILED python\ray\tests\test_basic.py::test_user_setup_function - subprocess.CalledProcessErro...
FAILED python\ray\tests\test_basic.py::test_disable_cuda_devices - subprocess.CalledProcessErr...
FAILED python\ray\tests\test_basic.py::test_wait_timing - assert (1634209333.6099107 - 1634209...
Results (395.22s):
36 passed
3 failed
- ray\tests/test_basic.py:197 test_user_setup_function
- ray\tests/test_basic.py:220 test_disable_cuda_devices
- ray\tests/test_basic.py:265 test_wait_timing
=================================== short test summary info ====================================
FAILED python\ray\tests\test_basic_3.py::test_fair_queueing - AssertionError: 23
Results (198.33s):
1 failed
- ray\tests/test_basic_3.py:169 test_fair_queueing
The following test passed when not skipped. Opening a PR to verify that.
def test_oversized_function(ray_start_shared_local_modes)