Commit graph

36 commits

Author SHA1 Message Date
Dmitri Gekhtman
fc4ac71deb
[minor] Fix legacy OSS operator test (#23540)
A legacy K8s test fails due to incorrect usage of @ray.method which only started raising errors after the Ray 1.12.0 branch cut.
This PR removes the use of @ray.method in the test.

Some context in #23271 and #23471

In addition, I noticed some of the test were flakey due to out-of-memory issues. For that reason, I've doubled the memory request and limits in the legacy operator's example files.

I've also added CPU limits in an example file that was missing them -- it makes the most sense for consistency with Ray's resource model to use CPU limits in K8s configs.

Finally, I added an extra note to the instructions for running the tests.
2022-04-18 17:47:42 -07:00
Dmitri Gekhtman
f51566e622
Prep K8s operator for the Ray 1.11.0 release. (#22264)
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.
2022-02-09 18:59:50 -08:00
Yi Cheng
68ec652be7
[gcs] New option to increase gcs grpc client threads and fix issues in hybrid scheduling (#19663)
## Why are these changes needed?

- Since broadcasting is moving to grpc, introducing the option to increase the client side thread number
- For hybrid schedule, ignore the threshold if gcs based actor scheduler is enabled

With these fixing, actor creation rate > 600actor/s vs ~ 140 actor/s

## Related issue number
2021-10-28 22:40:18 -07:00
Dmitri Gekhtman
5608a4e441
fix (#18123) 2021-08-26 14:14:09 -04:00
Sasha Sobol
fcb044d47c
[autoscaler] make 0 default min/max workers for head node (#17757)
* make 0 default min/max workers for head node

* fix helm charts, test, defaults for head

* fix test, docs

* make 0 default min/max workers for head node

* fix helm charts, test, defaults for head

* fix test, docs

* comments. logging

* better wording (logs)

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>

* fix logging message

* fix max workers in raycluster.yaml

* use default values of 0 for min/max workders in a helm chart

* add missing line back

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
2021-08-25 14:56:20 -04:00
Holden Karau
b9dae93bfa
Add ephemeral-storage: 1Gi requests but no limits. (#17854)
* Add ephemeral-storage: 1Gi requests but no limits. This is useful when scheduling in a storage constrained env since ray assumes it has ephemeral storage to use.

* Add ephemeral-storage: 1Gi to b/deploy/charts/ray/templates/operator_cluster_scoped.yaml b/deploy/charts/ray/templates/operator_namespaced.yaml
2021-08-17 21:10:39 -04:00
Navneet Nandan
35d86ebfee
Added support to use tolerations for head and worker nodes (#17608)
* Added support to use tolerations for head and worker nodes

* removed the imagePullSecret configuration

* Update comments

* minor comment change

* add back rayproject/ray:nightly comment

Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-08-16 17:06:15 -04:00
Holden Karau
e0f8e18173
Make the ray logs visible (#17810) 2021-08-15 17:16:55 -04:00
Dmitri Gekhtman
b6443f9ec8
Revert "Added support for the imagePullSecrets in helm chart (#17520)" (#17678)
This reverts commit 208d997414.
2021-08-09 12:11:58 -04:00
Navneet Nandan
208d997414
Added support for the imagePullSecrets in helm chart (#17520) 2021-08-04 09:45:39 -04:00
crdnb
113ed2a07c
[kubernetes] Adding cpu limit to make ray helm chart working in environments which require set resource limits (#16701) 2021-06-30 13:31:55 -07:00
Dmitri Gekhtman
a60ee3a8b2
[autoscaler][kubernetes][minor] latest images everywhere (#16205)
* latest images everywhere

* add back some documentation on the images

* Doc update
2021-06-04 16:01:39 -07:00
Travis Addair
050a076de9
[k8s] Refactored k8s operator to use kopf for controller logic (#15787)
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-06-01 12:00:55 -07:00
Dmitri Gekhtman
27c2f570f1
[kubernetes] pin the K8s config yamls to ray:latest instead of ray1.3 (#15988) 2021-06-01 19:12:35 +03:00
Dmitri Gekhtman
95c3d88cac
[autoscaler][kubernetes] Helm chart (#15614) 2021-05-17 16:55:10 -07:00
Eric Liang
67544992b5
Remove the old operator directory (#12143) 2020-11-19 15:37:28 -08:00
Ian Rodney
4c3f09094a
[docs] redis-port -> port (#10937) 2020-09-23 17:04:13 -07:00
Eric Liang
6fcb816fdd
Ray operator deprecation message (#10334) 2020-08-25 18:26:02 -07:00
acmore
fa0a677aac
Customize service account name. (#8901) 2020-06-16 12:49:41 -05:00
Robert Nishihara
d985d7537e
Replace all instances of ray.readthedocs.io with ray.io (#7994) 2020-04-13 16:17:05 -07:00
Edward Oakes
90b553ed05
[operator] Use headless service for head node (#7622) 2020-03-19 10:31:56 -05:00
Edward Oakes
c78b52b5b2
Set RayCluster as service owner (#7621) 2020-03-19 10:30:44 -05:00
Edward Oakes
883ee4912d
Return reconcile.Result{}, not nil (#7521) 2020-03-09 16:27:15 -07:00
Edward Oakes
08d4cb3822
[operator] Minor cleanup (#7498) 2020-03-09 11:23:46 -07:00
Edward Oakes
27b4ffa98e
Improve k8s operator documentation (#7496) 2020-03-09 11:09:06 -07:00
Edward Oakes
e29f2ef788
[operator] Small bugfixes (#7459) 2020-03-05 10:57:56 -08:00
mehrdadn
e09f63ad65
Fix build errors and add more targets to Windows builds (#6811)
* Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa)

* Add missing COPTS

* Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows)

* Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686)

* Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override

* Fix missing override on IsPlasmaBuffer()

* Fix missing libraries for streaming

* Factor out install-toolchains.sh

* Put some Bazel flags into .bazelrc

* Fix jni_md.h missing inclusion

* Add ~/bin to PATH for Bazel

* Change echo $$(date) > $@ to date > $@

* Fix lots of unquoted paths

* Add system() call checks for Windows

Co-authored-by: GitHub Web Flow <noreply@github.com>
2020-02-11 16:49:33 -08:00
Qstar
52ed42635f
add role rbac and add add guide (#7091) 2020-02-10 11:03:15 -08:00
Ce Gao
574abe844a [ray-operator] Remove useless RBAC rules (#6853)
Signed-off-by: Ce Gao <gaoce@caicloud.io>
2020-01-21 00:31:07 -06:00
Ce Gao
125e26dde5 [ray-operator] Watch the pod resource and remove useless code (#6852)
Signed-off-by: Ce Gao <gaoce@caicloud.io>
2020-01-20 12:13:30 -06:00
Ce Gao
23f32c5ec8 [ray-operator]: Add ignore file (#6851)
Signed-off-by: Ce Gao <gaoce@caicloud.io>
2020-01-20 12:13:01 -06:00
chenk008
f69081242e Ray operator travis (#6731) 2020-01-09 16:16:08 -06:00
chenk008
3a2a4335b6 Ray operator go.mod file (#6660)
* change .gitignore for go.mod

* change gitignore and add go.mod for ray-operator
2020-01-02 11:55:16 -06:00
chenk008
4150d444a1 ray-operator support bazel build (#6639)
* support bazel build

* add bazel gazelle script in README
2019-12-31 22:28:51 -08:00
Qstar
10338fde0c Ray operator: controller code and guide to use (#6501) 2019-12-29 10:14:47 -06:00
Qstar
ed294f4c23 Ray Kubernetes Operator Part 1: readme, structure, config and CRD realted file (#6332)
* Ray-Operator first PR
1.RayCluster CRD and CR, structure code in golang
2.config file in Kubernetes

* Delete go.sum

* Ray-Operator first PR
1.add directory structure
2.add guide for submitting RayCluster

* Delete ray_v1_raycluster.bk.yaml

* Ray-Operator first PR
1.delete file bk
2.add more description about kubernetes and ray-operator features

* Ray-Operator first PR: adjust grammar

* Ray-Operator first PR: add More Information about proposal

* Ray-Operator first PR:
1.add heterogeneous version of CR
2.add reference ot key words, and reference links to the props in yaml
3.file structure to yaml level and function description

* Ray-Operator first PR: add ray operator proposal doc

* Ray-Operator first PR: add More Information about proposal

* Ray-Operator first PR: add command to start

* Ray-Operator first PR: add More Information about proposal

* Update deploy/ray-operator/README.md

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update deploy/ray-operator/api/v1/raycluster_types.go

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update deploy/ray-operator/api/v1/raycluster_types.go

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Ray-Operator first PR: add More Information about proposal

* Ray-Operator first PR: remove License

* Ray-Operator first PR: rename version from v1 to v1alpha1

* Ray-Operator first PR: use replicas instead of numNodes

* Ray-Operator first PR: update replicas in CR yaml file

* Ray-Operator first PR: add More Information about proposal
2019-12-05 22:45:03 -08:00