Commit graph

16 commits

Author SHA1 Message Date
Andrew Li
989bb48339
[K8s/Autoscaler] Added a field for the service account name (#27004)
Exposed serviceAccountName on values.yaml (#26657)

Signed-off-by: Andrew Li <orcahmlee@gmail.com>
2022-07-26 19:47:18 -07:00
Andrew Li
3853186472
Exposed upscaling_speed and idle_timeout_minutes to values.yaml, #25312 (#25495)
Exposed upscaling_speed and idle_timeout_minutes to values.yaml.
2022-06-06 13:26:06 -04:00
Dmitri Gekhtman
fc4ac71deb
[minor] Fix legacy OSS operator test (#23540)
A legacy K8s test fails due to incorrect usage of @ray.method which only started raising errors after the Ray 1.12.0 branch cut.
This PR removes the use of @ray.method in the test.

Some context in #23271 and #23471

In addition, I noticed some of the test were flakey due to out-of-memory issues. For that reason, I've doubled the memory request and limits in the legacy operator's example files.

I've also added CPU limits in an example file that was missing them -- it makes the most sense for consistency with Ray's resource model to use CPU limits in K8s configs.

Finally, I added an extra note to the instructions for running the tests.
2022-04-18 17:47:42 -07:00
Dmitri Gekhtman
f51566e622
Prep K8s operator for the Ray 1.11.0 release. (#22264)
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.
2022-02-09 18:59:50 -08:00
Yi Cheng
68ec652be7
[gcs] New option to increase gcs grpc client threads and fix issues in hybrid scheduling (#19663)
## Why are these changes needed?

- Since broadcasting is moving to grpc, introducing the option to increase the client side thread number
- For hybrid schedule, ignore the threshold if gcs based actor scheduler is enabled

With these fixing, actor creation rate > 600actor/s vs ~ 140 actor/s

## Related issue number
2021-10-28 22:40:18 -07:00
Dmitri Gekhtman
5608a4e441
fix (#18123) 2021-08-26 14:14:09 -04:00
Sasha Sobol
fcb044d47c
[autoscaler] make 0 default min/max workers for head node (#17757)
* make 0 default min/max workers for head node

* fix helm charts, test, defaults for head

* fix test, docs

* make 0 default min/max workers for head node

* fix helm charts, test, defaults for head

* fix test, docs

* comments. logging

* better wording (logs)

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>

* fix logging message

* fix max workers in raycluster.yaml

* use default values of 0 for min/max workders in a helm chart

* add missing line back

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
2021-08-25 14:56:20 -04:00
Holden Karau
b9dae93bfa
Add ephemeral-storage: 1Gi requests but no limits. (#17854)
* Add ephemeral-storage: 1Gi requests but no limits. This is useful when scheduling in a storage constrained env since ray assumes it has ephemeral storage to use.

* Add ephemeral-storage: 1Gi to b/deploy/charts/ray/templates/operator_cluster_scoped.yaml b/deploy/charts/ray/templates/operator_namespaced.yaml
2021-08-17 21:10:39 -04:00
Navneet Nandan
35d86ebfee
Added support to use tolerations for head and worker nodes (#17608)
* Added support to use tolerations for head and worker nodes

* removed the imagePullSecret configuration

* Update comments

* minor comment change

* add back rayproject/ray:nightly comment

Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-08-16 17:06:15 -04:00
Dmitri Gekhtman
b6443f9ec8
Revert "Added support for the imagePullSecrets in helm chart (#17520)" (#17678)
This reverts commit 208d997414.
2021-08-09 12:11:58 -04:00
Navneet Nandan
208d997414
Added support for the imagePullSecrets in helm chart (#17520) 2021-08-04 09:45:39 -04:00
crdnb
113ed2a07c
[kubernetes] Adding cpu limit to make ray helm chart working in environments which require set resource limits (#16701) 2021-06-30 13:31:55 -07:00
Dmitri Gekhtman
a60ee3a8b2
[autoscaler][kubernetes][minor] latest images everywhere (#16205)
* latest images everywhere

* add back some documentation on the images

* Doc update
2021-06-04 16:01:39 -07:00
Travis Addair
050a076de9
[k8s] Refactored k8s operator to use kopf for controller logic (#15787)
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-06-01 12:00:55 -07:00
Dmitri Gekhtman
27c2f570f1
[kubernetes] pin the K8s config yamls to ray:latest instead of ray1.3 (#15988) 2021-06-01 19:12:35 +03:00
Dmitri Gekhtman
95c3d88cac
[autoscaler][kubernetes] Helm chart (#15614) 2021-05-17 16:55:10 -07:00