Commit graph

184 commits

Author SHA1 Message Date
Dmitri Gekhtman
6cf263838f
[docs][touch-up] Add ephemeral storage to Ray-on-K8s example. (#27916) 2022-08-18 11:29:55 -07:00
Ian Rodney
24508db920
[Docs][GCP] Configuring ServiceAccounts for worker (#27915)
Enables better usage with GCP.

The default behavior is that the head runs with the ray-autoscaler-sa-v1 service Account, but workers do not. Workers can run with this service account by copying & uncommenting L114->L117 from example-full


Signed-off-by: Ian <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-16 13:13:27 -07:00
Dmitri Gekhtman
bceef503b2
[Kubernetes][docs] Restore legacy Ray operator migration discussion (#27841)
This PR restores notes for migration from the legacy Ray operator to the new KubeRay operator.

To avoid disrupting the flow of the Ray documentation, these notes are placed in a README accompanying the old operator's code.

These notes are linked from the new docs.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-16 08:46:31 -07:00
Chen Shen
f05c744a65
[Doc] minor fix on accessing AWS/S3
update the doc.
2022-08-15 16:53:31 -07:00
Eric Liang
52f7b89865
[docs] Editing pass on clusters docs, removing legacy material and fixing style issues (#27816) 2022-08-12 00:15:03 -07:00
Stephanie Wang
043eac06ac
[docs] Revamp clusters section on job submission (#27756)
Page structure changes:

    Deploying a Ray Cluster on Kubernetes
        Getting Started -> links to jobs
    Deploying a Ray Cluster on VMs
        Getting started -> links to jobs
        User Guides
            Autoscaling (moved more content here in favor of the Getting started page)
    Running Applications on Ray Clusters
        Ray Jobs
            Quickstart Using the Ray Jobs CLI
            Python SDK
            REST API
            Ray Job Submission API Reference
            Ray Client

Content changes:

    modified "Deploying a Ray Cluster ..." quickstart pages to briefly summarize ad-hoc command execution, then link to jobs
    modified Ray Jobs example to be more incremental - start with a simple example, then show long-running script, then show example with a runtime env, instead of all of them at once
    center Ray Jobs quickstart around using the CLI. Made some minor changes to the Python SDK page to match it
    remove "Ray Jobs Architecture"
    moved "Autoscaling" content away from Kubernetes "Getting started" page into its own user guide. I think it's too complicated for "Getting Started". No content cuts.
    Cut "Viewing the dashboard" and "Ray Client" from Kubernetes "Getting started" page.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
2022-08-10 20:15:55 -07:00
Chen Shen
ddca52d2ca
[cluster doc] Promote new doc and deprecate the old (#27759)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-10 17:41:56 -07:00
Stephanie Wang
54a9b1d2d0
[docs] Revamp docs on observability for ray cluster apps (#27724)
Signed-off-by: Stephanie Wang swang@cs.berkeley.edu

Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections:

    Add links to the more comprehensive observability section.
    Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was.
    Include more specific instructions about setting up observability tools for VMs vs k8s.
2022-08-10 15:06:28 -07:00
Jiajun Yao
fe4f2b5b07
[Doc] Add a cluster xgboost example for vm stack (#27732)
This is adapted from the same example of the k8s stack.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-10 12:36:16 -07:00
Chen Shen
a1d80dc195
[Cluster-launcher doc] revamp the vm part (#27431) 2022-08-10 02:43:28 -07:00
Cade Daniel
03d835e4e2
[Ray Clusters][docs] Create new Running Apps on Ray Clusters section (#27723)
This adds the structure described here, namely adding a new section under Ray Clusters which is focused on running applications on Ray clusters.

Signed-off-by: Cade Daniel <cade@anyscale.com>

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2022-08-09 21:01:47 -07:00
Cade Daniel
8826646303
[Ray Clusters][docs] Restructuring Clusters API reference (#27679)
*This PR:

Copies the existing clusters API reference to the new structure. The reference docs are split out into Ray Clusters (common between vms and k8s) and Ray Clusters on VMs (specific to vms). Notably, there is also a reference section for k8s, but not in this PR.
Move the three job submission user guides back into a single one. Jules had suggested that we break them out into rest/sdk/cli, but that's not P0 right now.
Fix some bugs in the left navigation bar. There should be less duplication of TOC entries. I'll keep working on related fixes in a different PR.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-08-09 15:33:09 -07:00
Richard Liaw
93a3cc222b
[docs/air] remove xgboost/lightgbm references and move AIR toc (#27687) 2022-08-09 12:49:44 -07:00
Cade Daniel
13f43b939a
[docs][Ray Clusters] Key Concepts page (#27510) 2022-08-09 10:01:05 -07:00
Richard Liaw
bb5e8c3536
fix-link-check (#27703) 2022-08-09 08:57:49 -07:00
Dmitri Gekhtman
3293317c40
[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)
This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-09 00:38:05 -07:00
clarng
098628d9bf
[doc] update autoscaler config (VM) page (#27539)
Update autoscaler configuration docs for VM stack.
Removed the video, after looking at it it fits better in overview / and is possibly outdated

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-08-05 17:03:06 -07:00
Dmitri Gekhtman
06f7f33a4e
[docs] KubeRay config guide and autoscaling discussion (#27504)
This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-05 13:11:28 -07:00
Cade Daniel
f94a2fe166
[docs][Ray Clusters] New Ray Clusters getting started page. (#27391)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-05 10:21:56 -07:00
Philipp Moritz
64fc1155b7
[docs] K8s docs intro polish and KubeRay architecture diagram (#27488)
* Save work

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* Update

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* consistency

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* update

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* fixes

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* simplify

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* update

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* fix

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* update

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* wording

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

* update

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
2022-08-04 10:07:15 -07:00
Cade Daniel
99ad0667a5
[docs][Ray Clusters] Migrate Community Supported Cluster Launcher to new structure. (#27376)
This PR migrates the old Community Supported Cluster Launcher docs to the new Ray Clusters doc structure.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-08-03 11:07:10 -07:00
Dmitri Gekhtman
4d87e8112a
[docs][kubernetes] GPU user guide (#27360)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

This PR

adds a page of guidance on GPU deployment with Ray/K8s. This page is a modified and slightly expanded version of the existing page https://docs.ray.io/en/latest/cluster/kubernetes-gpu.html
moves managed K8s service intro links to their own page
2022-08-02 15:58:23 -07:00
Dmitri Gekhtman
6efca71c35
[docs][kubernetes] XGBoost ML example (#27313)
Adds a guide on running an XGBoost-Ray workload using KubeRay.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-01 19:30:41 -07:00
Dmitri Gekhtman
059895ab5b
[docs][kubernetes] Shift docs into new structure (#27239)
This PR shifts KubeRay docs into the structure introduced in #27036.
There are no content changes.
2022-07-29 14:19:51 -07:00
Cade Daniel
db26c779a0
[Ray clusters] [docs] Copying all Ray Clusters doc content to new structure (#27062) 2022-07-27 14:22:44 -07:00
Cade Daniel
7a817ad364
Moving Ray Clusters restructuring section to be subpage under existing Ray Clusters. (#27036)
This PR puts the Ray Clusters (under construction) docs section (see #26754) under Ray Clusters as a subpage.

This makes the master branch docs clean and presentable for users
Ray Clusters doc writers can use existing CI to iterate on the docs, without having a massive PR once we're done.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-07-26 15:52:06 -07:00
Dmitri Gekhtman
a70ada7341
[kubernetes][docs] Implement landing page and getting started guide (#26912)
Implements a landing page for the new KubeRay-based deployment guide.
Implements a "Getting started" Jupyter notebook
2022-07-26 00:41:56 -07:00
Archit Kulkarni
084f06f49a
[Doc] [Job submission] [Dashboard] Add tip for long runtime_env installation and improve error (#26911)
# Why are these changes needed?
The dashboard can display the message <actor> cannot be created because the Ray cluster cannot satisfy its resource requirements in the case where the runtime env setup is stalled. This PR updates this message to include the possibility of the runtime env setup failing.
This PR adds a tip to the Job Submission doc saying that if a job is stalled in PENDING, the runtime env setup may have stalled. It adds a pointer to the log files which should have more information.
The runtime env cannot stall forever, it fails after 10 minutes. This is a new feature added after the Ray 1.13 branch cut. In Ray <= 1.13, the runtime env can still stall forever.

# Related issue number
Closes #26332
2022-07-25 23:32:27 -07:00
Stephanie Wang
55a0f7bb2d
[core] ray.init defaults to an existing Ray instance if there is one (#26678)
ray.init() will currently start a new Ray instance even if one is already existing, which is very confusing if you are a new user trying to go from local development to a cluster. This PR changes it so that, when no address is specified, we first try to find an existing Ray cluster that was created through `ray start`. If none is found, we will start a new one.

This makes two changes to the ray.init() resolution order:
1. When `ray start` is called, the started cluster address was already written to a file called `/tmp/ray/ray_current_cluster`. For ray.init() and ray.init(address="auto"), we will first check this local file for an existing cluster address. The file is deleted on `ray stop`. If the file is empty, autodetect any running cluster (legacy behavior) if address="auto", or we will start a new local Ray instance if address=None.
2. When ray.init(address="local") is called, we will create a new local Ray instance, even if one is already existing. This behavior seems to be necessary mainly for `ray.client` use cases.

This also surfaces the logs about which Ray instance we are connecting to. Previously these were hidden because we didn't set up the log until after connecting to Ray. So now Ray will log one of the following messages during ray.init:
```
(Connecting to existing Ray cluster at address: <IP>...)
...connection...
(Started a local Ray cluster.| Connected to Ray Cluster.)( View the dashboard at <URL>)
```

Note that this changes the dashboard URL to be printed with `ray.init()` instead of when the dashboard is first started.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-07-23 11:27:22 -07:00
Dmitri Gekhtman
fdd5c53bfd
[KubeRay] Documentation structure and skeleton (#26589)
Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.
2022-07-19 13:28:04 -07:00
Jiajun Yao
60dd77a2d3
Enable usage stats collection for ray.init iff nightly wheels (#26461)
For nightly wheels, we want to collect usage stats for local clusters started via ray.init() as well.
2022-07-14 12:14:01 -07:00
Dmitri Gekhtman
3af3269b8e
[KubeRay][docs] Warning about kubectl apply, update feature state wording. (#25685)
This PR

Adds a warning about a known issue to the KubeRay section of the Ray docs.
Updates the description of the feature state of KubeRay integration.
Adds some links to the KubeRay docs.
2022-06-27 14:11:00 -07:00
clarng
ef866d1e49
exclude doc_code from import sorting (#25772)
Skip sorting the imports in doc_code.
2022-06-15 11:34:45 -07:00
Dmitri Gekhtman
e745cd0e7b
[Docs] Note that certain features are community maintained (#25687)
Adds notes explaining that Ray's support on Azure, Aliyun, and SLURM is community-maintained.
Rephrases the mention of K8s support in the intro.

This PR replaces https://github.com/ray-project/ray/pull/25504.
2022-06-13 16:10:32 -07:00
Dmitri Gekhtman
836b08597f
[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator (#25386)
This PR incorporates recent autoscaler config changes from KubeRay.
2022-06-08 20:09:43 -07:00
Archit Kulkarni
3296345557
Add warning about entrpoint command in quotes (#25519) 2022-06-08 09:38:55 -07:00
G Goswami
7ddc23a8f5
Fixing example (#25524)
Remove quotes from K8s job submission example in docs.
2022-06-06 18:21:19 -04:00
Zhe Zhang
2d74ecc2ec
[Docs] [Clusters] Fix issues in the overview part of Cluster Deployment Guide, and fix a typo (#25473)
* Fix issues in the overview part, and fix a typo

* Addressing comment

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-06-06 14:11:41 -07:00
Zhe Zhang
4cc202585a
[Docs] Document Ray downscaling behavior (#25466) 2022-06-03 17:08:21 -07:00
Eric Liang
c1b2ad112e
Comment our banner (#25369) 2022-06-01 16:36:33 -07:00
Zhe Zhang
52774e8460
Use bold font consistently on landing page (#25318) 2022-06-01 11:44:46 -04:00
javi-redondo
a8fc0c5015
Add landing & key concepts pages for clusters (#24379)
Add landing & key concepts pages for clusters
2022-05-25 10:23:50 -07:00
Simon Mo
c3ac6fcf3f
Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894) 2022-05-17 19:31:05 -07:00
Jiajun Yao
1daad65568
[Doc] Add doc for usage stats collection (#24522) 2022-05-10 17:18:49 -07:00
Dmitri Gekhtman
d68c1ecaf9
[kuberay] Test Ray client and update autoscaler image (#24195)
This PR adds KubeRay e2e testing for Ray client and updates the suggested autoscaler image to one running the merge commit of PR #23883 .
2022-04-27 18:02:12 -07:00
Chen Shen
1d981e0cf1
[doc] fix /cluster/config.html #23720
closes #23560
2022-04-22 10:13:12 -07:00
Dmitri Gekhtman
8c5fe44542
[KubeRay] Fix autoscaling with GPUs and custom resources, with e2e tests (#23883)
- Closes #23874 by fixing a typo ("num_gpus" -> "num-gpus").
- Adds end-to-end test logic confirming the fix.
- Adds end-to-end test logic confirming autoscaling with custom resources works.
- Slightly refines developer instructions.
- Deflakes test logic a bit by allowing for the event that the head pod changes its identity as the Ray cluster starts up.
2022-04-21 14:54:37 -07:00
Zyiqin-Miranda
e4a66c0e2e
[doc] Add CloudWatch integration documentation (#22638)
This PR adds documentation for Ray CloudWatch integration.
2022-04-21 09:44:41 -07:00
Philipp Moritz
886cc4d674
Fix broken links in documentation and put linkcheck linter in place on CI (#23340) 2022-03-18 21:02:52 -07:00
Archit Kulkarni
76bb5396c7
[Doc] [jobs] Add links to Job Submission and improve doc (#23209)
- Adds links to Job Submission from existing library tutorials where `ray submit` is used.  When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested.
- Adds docstrings for the Jobs SDK, which automatically show up in the API reference
- Improve the Job Submission main page
- Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-03-18 12:52:13 -05:00