ray/python
mwtian 50d49a2d7a
[Core] use higher niceness for workers (#24928)
Looking at past failures of dataset_shuffle_push_based_random_shuffle_1tb and when running it on my own, I noticed that raylets are killed because GCS was not able to respond to it in time. It seems at the beginning of the run, there is a huge CPU spike which starved GCS out of CPU. With the same spirit of adjusting workers to higher OOM scores, we can give workers higher niceness so they yield CPU to GCS, Raylet and other user processes.

I ran dataset_shuffle_push_based_random_shuffle_1tb a few time which no longer sees raylet death because of GCS CPU starvation. But there are other issues making the test fail which I will continue to investigate.
2022-05-23 08:12:51 -07:00
..
ray [Core] use higher niceness for workers (#24928) 2022-05-23 08:12:51 -07:00
requirements [RLlib] Upgrade gym 0.23 (#24171) 2022-05-23 08:18:44 +02:00
asv.conf.json [docs] Move all /latest links to /master (#11897) 2020-11-10 10:53:28 -08:00
build-wheel-macos-arm64.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-macos.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-manylinux2014.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-windows.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
MANIFEST.in Includes .pyi files in package data. (#21247) 2021-12-27 11:50:02 -08:00
README-building-wheels.md [build] Build wheels with manylinux2014 (#11621) 2020-11-03 19:36:32 -08:00
requirements.txt [RLlib] Upgrade gym 0.23 (#24171) 2022-05-23 08:18:44 +02:00
requirements_linters.txt Remove yapf dependency (#23656) 2022-04-04 21:50:04 -07:00
requirements_ml_docker.txt [AIR] Add distributed torch_geometric example (#23580) 2022-04-21 09:48:43 -07:00
setup.py Revert "[Core] allow using grpcio > 1.44.0 (#23722)" (#24935) 2022-05-18 18:16:39 -07:00