Dmitri Gekhtman
8f83053e35
[autoscaler][Kubernetes] Operator subprocess error reporting, configuration fixes ( #15526 )
2021-05-04 16:45:37 -05:00
Dmitri Gekhtman
de897673c5
[kubernetes][autoscaler] Kubernetes operator basic fixes ( #15469 )
2021-04-29 10:45:52 -05:00
Dmitri Gekhtman
6b0673f207
[doc][Kubernetes][minor] Restructure section labels for operator launch ( #14962 )
2021-04-23 09:50:58 -07:00
Dmitri Gekhtman
fd43e9e6f8
[kubernetes][doc][minor] Add namespace to job creation command ( #15442 )
2021-04-23 09:44:51 -07:00
Dmitri Gekhtman
e6864523cf
[autoscaler] Do not divide by zero in resource demand scheduler ( #15323 )
...
* Do not divide by zero
* Don't take min or mean of an empty list
* max workers 0 for head node in distributed benchmark
* test
* Correct the type annotation
* comment grammar tweak
* message
* docs
* test
* Move test cli to large tests.
2021-04-16 10:20:05 -07:00
Richard Liaw
59bf3a7b22
ray[cluster] -> ray[default] ( #15251 )
2021-04-14 09:37:04 -07:00
Richard Liaw
e72f6b0377
Fix ray[full] -> ray[cluster] #15112
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-04-05 09:55:00 -07:00
Dmitri Gekhtman
474fb6bf0c
[kubernetes][client][docs] Note requirement for matching Ray versions ( #15068 )
2021-04-01 15:08:25 -07:00
Ian Rodney
73fb5d6022
[Autoscaler][Docker] Make disable_shm_size_detection more usable ( #14913 )
2021-03-30 18:10:09 -07:00
Richard Liaw
c1c9649671
Set up things to remove dependencies in later release ( #14793 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-19 13:54:52 -07:00
Ian Rodney
eb12033612
[Code Cleanup] Switch to use ray.util.get_node_ip_address() ( #14741 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 13:10:57 -07:00
Michael Schock
42dcacd888
[k8s] Minor doc fix ( #14732 )
2021-03-17 16:15:38 -07:00
Ian Rodney
8a936ad64d
[Autoscaler Docs] Use worker_run_options
( #14721 )
...
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-03-16 18:04:27 -07:00
Brian Yu
a65002514c
[Doc] Update Slurm documentation examples ( #14673 )
2021-03-15 00:27:13 -07:00
Dmitri Gekhtman
3f6c23e3cc
[doc][autoscaler][minor] Fix quickstart guide: ray.init(address='auto') ( #14459 )
2021-03-03 17:58:52 -08:00
Dmitri Gekhtman
1675156a8b
[autoscaler][interface] Use multi node types in defaults.yaml and example-full.yaml ( #14239 )
...
* random doc typo
* example-full-multi
* left off max workers
* wip
* address comments, modify defaults, wip
* fix
* wip
* reformat more things
* undo useless diff
* space
* max workers
* space
* copy-paste mishaps
* space
* More copy-paste mishaps
* copy-paste issues, space, max_workers
* head_node_type
* legacy yamls
* line undeleted
* correct-gpu
* Remove redundant GPU example.
* Extraneous comment
* whitespace
* example-java.yaml
* Revert "example-java.yaml"
This reverts commit 1e9c0124b9d97e651aaeeb6ec5bf7a4ef2a2df17.
* tests and other things
* doc
* doc
* revert max worker default
* Kubernetes comment
* wip
* wip
* tweak
* Address comments
* test_resource_demand_scheduler fixes
* Head type min/max workers, aws resources
* fix example_cluster2.yaml
* Fix external node type test (compatibility with legacy-style external node types)
* fix test_autoscaler_aws
* gcp-images
* gcp node type names
* fix gcp defaults
* doc format
* typo
* Skip failed Windows tests
* doc string and comment
* assert
* remove contents of default external head and worker
* legacy external failed validation test
* Readability -- define the minimal external config at the top of the file.
* Remove default worker type min worker
* Remove extraneous global min_workers comment.
* per-node-type docker in aws/example-gpu-docker
* ray.worker.small -> ray.worker.default
* fix-docker
* fix gpu docker again
* undo kubernetes experiment
* fix doc
* remove worker max_worker from kubernetes
* remove max_worker from local worker node type
* fix doc again
* py38
* eric-comment
* fix cluster name
* fix-test-autoscaler
* legacy config logic
* pop resources
* Remove min_workers AFTER merge
* comment, warning message
* warning, comment
2021-03-03 06:16:19 +02:00
Dmitri Gekhtman
58c0959ea7
[kubernetes][docs][minor] Move Kubernetes example scripts to docs ( #14412 )
2021-03-01 20:17:16 -08:00
javi-redondo
0408fe6a69
Small improvements to the Ray Cluster docs ( #14241 )
...
* Small improvements to the Ray Cluster docs
* Update quickstart.rst
Changed title for quick start
Co-authored-by: Javier Redondo <javier@Anyscale-MacBook-Pro.local>
2021-02-23 13:44:28 +02:00
Dmitri Gekhtman
090970bdf5
[autoscaler] Max worker default infinity ( #14201 )
...
* random doc typo
* max-worker-default-inf
* fix
* -1 means infinity
* doc
* comment tweak
* fix random typo
* Cluster max-worker default
* fix
* typo
* test
* Git add the test
* doc-tweak
* rest of the test logistics
* periods in doc
* Address comments
* docstring
2021-02-22 05:14:00 +02:00
Alex Wu
753083c617
[docs][autoscaler] Update AWS node config link ( #14125 )
2021-02-17 10:44:10 -08:00
javi-redondo
b8b2d6410d
[docs] new Ray Cluster documentation ( #13839 )
...
Co-authored-by: Javier Redondo <javier@anyscale.com>
Co-authored-by: AmeerHajAli <ameerh@berkeley.edu>
2021-02-15 00:47:14 -08:00
Dmitri Gekhtman
6644a0fe50
[autoscaler][kubernetes][docs] Updated Kubernetes Documentation ( #14016 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-11 23:00:25 -08:00
Dmitri Gekhtman
1187d1dd3e
[autoscaler][kubernetes][operator] Rudimentary error handling, make "MODIFIED" -> update event work. ( #13756 )
2021-02-03 20:07:11 -06:00
Ameer Haj Ali
1fbb752f42
[autoscaler] remove worker_default_node_type that is useless. ( #13588 )
2021-01-21 17:04:38 -08:00
PENG Zhenghao
e63da54931
[docs] Add more guideline on using ray in slurm cluster ( #12819 )
...
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: PENG Zhenghao <pengzh@ie.cuhk.edu.hk>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-14 12:17:53 -08:00
Simon Mo
8e0a2f669b
[Doc] Remove trailing whitespaces ( #13390 )
2021-01-12 20:35:38 -08:00
Dmitri Gekhtman
7166949194
[Kubernetes][Docs] GPU usage ( #13325 )
...
* gpu-note
* gpu-note
* More info
* lint?
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* GKE->Kubernetes
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-11 21:36:31 -08:00
Dmitri Gekhtman
31453621ef
[kubernetes][docs][minor] Kubernetes version warning ( #13161 )
2021-01-04 10:29:17 -06:00
Gekho457
8cebe5cbe9
[docs][autoscaler][k8s][minor] quotes #12866
2020-12-14 18:24:13 -08:00
Gekho457
44f5be04ca
[autoscaler][k8s][doc][minor] Fix typo in k8s doc. ( #12865 )
2020-12-14 17:30:43 -08:00
Gekho457
11ce1dc743
Ray cluster CRD and example CR + multi-ray-cluster operator ( #12098 )
2020-12-14 10:26:01 -06:00
Eric Liang
4ad4463be6
Add comments to clarify purpose of new scheduler queues ( #12730 )
...
* update
* clarify
* update
2020-12-11 11:53:09 -08:00
Kai Yang
e3b5deb741
[Multi-tenancy] Delete flag enable_multi_tenancy
and remove old code path ( #10573 )
2020-12-10 19:01:40 +08:00
Ian Rodney
e2a147d5fb
[docs] Remove DL AMi reference ( #12120 )
2020-11-18 12:40:19 -08:00
Ameer Haj Ali
85197deece
[autoscaler] Remove legacy autoscaler ( #11802 )
2020-11-11 13:36:48 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master ( #11897 )
...
* use master link
* remae
* revert non-ray
* more
* mre
2020-11-10 10:53:28 -08:00
Eric Liang
a9cf0141a0
[autoscaler] Fix semantics of request_resources ( #11820 )
2020-11-09 14:57:40 -08:00
dHannasch
6147b6a1a3
[docs] Note that the printed IP address can be incorrect. ( #11804 )
...
* If the head node is on a subnet with NAT, then you will need a different IP address.
* Specify what you are checking firewall settings and network configuration *for*.
* reword following @amogkam
* Give the full error message.
2020-11-04 13:48:03 -08:00
dHannasch
e7f7cb29c4
[docs] Show expected terminal output for manual cluster setup ( #11752 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-02 20:59:14 -08:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes ( #11613 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw
a4b418d30c
[docs] update cloud docs ( #11262 )
...
* update-cloud-docs
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/config.rst
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
* fix
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* fix
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2020-10-21 16:37:26 -07:00
Ameer Haj Ali
6b86d4d280
Automatically detect CPU, GPU, accelerator_type for AWS ( #11147 )
2020-10-02 21:16:43 -07:00
Ian Rodney
0d5b09f426
[Docker] Automagically add "runtime=nvidia" ( #11125 )
2020-10-01 17:04:19 -07:00
Ameer Haj Ali
0d36e4c025
[autoscaler] Support min_workers for multi node type ( #11041 )
...
* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2020-09-28 22:02:01 -07:00
Richard Liaw
a563344bc2
[docs] remove ref to google groups -> github discussions ( #11019 )
2020-09-24 18:09:51 -07:00
Ian Rodney
4c3f09094a
[docs] redis-port -> port ( #10937 )
2020-09-23 17:04:13 -07:00
Lee moon soo
df4c3abe30
[autoscaler] Staroid node provider ( #10956 )
2020-09-22 21:25:29 -07:00
Richard Liaw
b0ca70f628
[tune+core] tune lifecycle and starting ray guide ( #10813 )
2020-09-21 11:27:50 -07:00
rkube
cd7351f6a3
Streamlined slurm script and removed references to redis_password ( #10827 )
...
Co-authored-by: Ralph Kube <ralph.kube@uit.not>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-09-18 14:55:56 -07:00
Keqiu Hu
8a77cf925a
[cli][ray] update ray cli message ( #10823 )
2020-09-17 09:26:55 -07:00