A legacy K8s test fails due to incorrect usage of @ray.method which only started raising errors after the Ray 1.12.0 branch cut.
This PR removes the use of @ray.method in the test.
Some context in #23271 and #23471
In addition, I noticed some of the test were flakey due to out-of-memory issues. For that reason, I've doubled the memory request and limits in the legacy operator's example files.
I've also added CPU limits in an example file that was missing them -- it makes the most sense for consistency with Ray's resource model to use CPU limits in K8s configs.
Finally, I added an extra note to the instructions for running the tests.
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.
## Why are these changes needed?
- Since broadcasting is moving to grpc, introducing the option to increase the client side thread number
- For hybrid schedule, ignore the threshold if gcs based actor scheduler is enabled
With these fixing, actor creation rate > 600actor/s vs ~ 140 actor/s
## Related issue number
* make 0 default min/max workers for head node
* fix helm charts, test, defaults for head
* fix test, docs
* make 0 default min/max workers for head node
* fix helm charts, test, defaults for head
* fix test, docs
* comments. logging
* better wording (logs)
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
* fix logging message
* fix max workers in raycluster.yaml
* use default values of 0 for min/max workders in a helm chart
* add missing line back
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
* Add ephemeral-storage: 1Gi requests but no limits. This is useful when scheduling in a storage constrained env since ray assumes it has ephemeral storage to use.
* Add ephemeral-storage: 1Gi to b/deploy/charts/ray/templates/operator_cluster_scoped.yaml b/deploy/charts/ray/templates/operator_namespaced.yaml
* Added support to use tolerations for head and worker nodes
* removed the imagePullSecret configuration
* Update comments
* minor comment change
* add back rayproject/ray:nightly comment
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
* Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa)
* Add missing COPTS
* Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows)
* Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686)
* Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override
* Fix missing override on IsPlasmaBuffer()
* Fix missing libraries for streaming
* Factor out install-toolchains.sh
* Put some Bazel flags into .bazelrc
* Fix jni_md.h missing inclusion
* Add ~/bin to PATH for Bazel
* Change echo $$(date) > $@ to date > $@
* Fix lots of unquoted paths
* Add system() call checks for Windows
Co-authored-by: GitHub Web Flow <noreply@github.com>
* Ray-Operator first PR
1.RayCluster CRD and CR, structure code in golang
2.config file in Kubernetes
* Delete go.sum
* Ray-Operator first PR
1.add directory structure
2.add guide for submitting RayCluster
* Delete ray_v1_raycluster.bk.yaml
* Ray-Operator first PR
1.delete file bk
2.add more description about kubernetes and ray-operator features
* Ray-Operator first PR: adjust grammar
* Ray-Operator first PR: add More Information about proposal
* Ray-Operator first PR:
1.add heterogeneous version of CR
2.add reference ot key words, and reference links to the props in yaml
3.file structure to yaml level and function description
* Ray-Operator first PR: add ray operator proposal doc
* Ray-Operator first PR: add More Information about proposal
* Ray-Operator first PR: add command to start
* Ray-Operator first PR: add More Information about proposal
* Update deploy/ray-operator/README.md
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update deploy/ray-operator/api/v1/raycluster_types.go
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update deploy/ray-operator/api/v1/raycluster_types.go
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* Ray-Operator first PR: add More Information about proposal
* Ray-Operator first PR: remove License
* Ray-Operator first PR: rename version from v1 to v1alpha1
* Ray-Operator first PR: use replicas instead of numNodes
* Ray-Operator first PR: update replicas in CR yaml file
* Ray-Operator first PR: add More Information about proposal