Reason for not using `queue.Queue` for multiprocessing purposes on Windows is at https://stackoverflow.com/a/37244276 and in the second reply to https://stackoverflow.com/a/37245300
And reason for using `multiprocessing.JoinableQueue` over `multiprocessing.Queue` is https://stackoverflow.com/a/30725121
AFAIK, this is because in Windows each process gets it own `Queue` and hence nothing is shared among those processes. When `multiprocessing.Queue` is used, changes in it are shared via pipes internally along with proper locks.
Resubmitting #21705 which was merged then reverted. It seems somehow sphinx building broke in the meantime, not clear how it is connected to this PR.
Here is the original description:
>Part of the effort to enable tests on windows, this enables test_metrics and test_metric_agents, which pass locally.
External Redis should still be supported with GCS bootstrapping, to avoid breaking users.
In GCS mode, some logic are removed for external Redis:
- Printing external Redis addresses to terminal: hard to implement across `ray start`, `ray.init()` and Ray cluster util.
- Starting local Redis if external Redis is unavailable: failing loudly here seems more appropriate.
Also, re-enable a few tests which restarts GCS in GCS bootstrapping mode, by using external Redis for KV storage.
After enabling tests of test_runtime_env_plugin and test_runtime_env_env_vars (PR #21252) and python/ray/serve:* tests (PR #21107), the analysis at flaky-tests.ray.io starting showing failing tests in the windows://python/ray/test/serv:test_standalone. PR #21352 reverted 21252 (runtime_env tests), but the problem was more likely in the serve tests. Specifically `test_standalone` has a test that uses Cluster, which should be skipped on windows because it is flaky. So this PR
- re-enables the runtime_env tests for windows
- skips the Cluster test in serve/tests/test_standalone.py
Uses a direct `pip install` instead of creating a conda env to make pip installs incremental to the cluster environment.
Separates the handling of `pip` and `conda` dependencies.
The new `pip` approach still works if only the base Ray is installed on the cluster and the user specifies libraries like "ray[serve]" in the `pip` field. The mechanism is as follows:
- We don't actually want to reinstall ray via pip, since this could lead to version mismatch issues. Instead, we want to use the Ray that's already installed in the cluster.
- So if "ray" was included by the user in the pip list, remove it
- If a library "ray[serve]" or "ray[tune, rllib]" was included in the pip list, remove it and replace it by its dependencies (e.g. "uvicorn", "requests", ..)
Co-authored-by: architkulkarni <arkulkar@gmail.com>
Co-authored-by: architkulkarni <architkulkarni@users.noreply.github.com>
Why are these changes needed?
Currently clang-tidy does not run inside scripts/format.sh. Also clang-tidy can produce false positive warnings. Maybe we can disable clang-tidy until ergonomic issues are resolved.
* Revert "Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)" (#18886)"
This reverts commit f851a072f3.
* use gcc 8
* Add Bazel config for building with llvm. Upgrade C++ std to 17.
* Fix redis. Try fixing asan and tsan
* Fix asan and format
* Update comments.
Co-authored-by: Chen Shen <scv119@gmail.com>
* clang-tidy
* fix
* fix script
* test clang compiler
* fix clang-tidy rules
* Fix windows and other issues.
* Fix
* Improve information when running check-git-clang-tidy-output.sh on different OS