A legacy K8s test fails due to incorrect usage of @ray.method which only started raising errors after the Ray 1.12.0 branch cut.
This PR removes the use of @ray.method in the test.
Some context in #23271 and #23471
In addition, I noticed some of the test were flakey due to out-of-memory issues. For that reason, I've doubled the memory request and limits in the legacy operator's example files.
I've also added CPU limits in an example file that was missing them -- it makes the most sense for consistency with Ray's resource model to use CPU limits in K8s configs.
Finally, I added an extra note to the instructions for running the tests.
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.
* make 0 default min/max workers for head node
* fix helm charts, test, defaults for head
* fix test, docs
* make 0 default min/max workers for head node
* fix helm charts, test, defaults for head
* fix test, docs
* comments. logging
* better wording (logs)
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
* fix logging message
* fix max workers in raycluster.yaml
* use default values of 0 for min/max workders in a helm chart
* add missing line back
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
* Add ephemeral-storage: 1Gi requests but no limits. This is useful when scheduling in a storage constrained env since ray assumes it has ephemeral storage to use.
* Add ephemeral-storage: 1Gi to b/deploy/charts/ray/templates/operator_cluster_scoped.yaml b/deploy/charts/ray/templates/operator_namespaced.yaml