Enables better usage with GCP.
The default behavior is that the head runs with the ray-autoscaler-sa-v1 service Account, but workers do not. Workers can run with this service account by copying & uncommenting L114->L117 from example-full
Signed-off-by: Ian <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
We currently measure end-to-end training time in our benchmarks, which includes setup overhead. This is an unequal comparison, as setup overhead for vanilla training cannot be accurately expressed and was instead just disregarded.
By comparing the raw training times in the actual training loop, we will get a more accurate expression of any potential overhead or benefit in using Ray vs. vanilla tensorflow/torch.
Signed-off-by: Kai Fricke <kai@anyscale.com>
This PR restores notes for migration from the legacy Ray operator to the new KubeRay operator.
To avoid disrupting the flow of the Ray documentation, these notes are placed in a README accompanying the old operator's code.
These notes are linked from the new docs.
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
Went through https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html, and doing some minor fix here.
Fix the size_bytes() result (before this PR it was using Parquet sampling, but we disasble it later)
Change one size_bytes() call to count() call as it was meant to use count() with followed wording That’s a lot of rows in doc.
Changed places are as followed in screenshots:
# Why are these changes needed?
- Promote APIs to PublicAPI(alpha)
- Change pre-alpha -> alpha
- Fix a bug ray_logs is displayed to ray --help
Release test result: #26610
Some APIs are subject to change at the beta stage (e.g., ray list jobs or ray logs).
Adds a page describing a development workflow for Serve applications.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.
Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
Refactor Datasets API docs for easier navigation: [Ray Datasets API](https://ray--27592.org.readthedocs.build/en/27592/data/api/api.html)
### Changes
1. Create a new Datasets API base page.
2. Split existing APIs into separate pages.
3. Split `Dataset` and `DatasetPipeline` methods into separate sections.
1. Used `autosummary` to generate overview tables at the top of each of these pages. Open to other suggestions e.g. moving the summary to the top of each section instead.
2. **Note:** Every time we add a new method we need to explicitly add it here as well.
4. Add Input/Output APIs.
1. I chose to split these primarily by data format rather than type, since it's easier to navigate, and the existing [Creating Datasets](https://docs.ray.io/en/master/data/creating-datasets.html) User Guide already does the latter.
6. Add `Block` and `DataBatch` (should we add these aliases?)
7. Remove existing `package-ref`.
Page structure changes:
Deploying a Ray Cluster on Kubernetes
Getting Started -> links to jobs
Deploying a Ray Cluster on VMs
Getting started -> links to jobs
User Guides
Autoscaling (moved more content here in favor of the Getting started page)
Running Applications on Ray Clusters
Ray Jobs
Quickstart Using the Ray Jobs CLI
Python SDK
REST API
Ray Job Submission API Reference
Ray Client
Content changes:
modified "Deploying a Ray Cluster ..." quickstart pages to briefly summarize ad-hoc command execution, then link to jobs
modified Ray Jobs example to be more incremental - start with a simple example, then show long-running script, then show example with a runtime env, instead of all of them at once
center Ray Jobs quickstart around using the CLI. Made some minor changes to the Python SDK page to match it
remove "Ray Jobs Architecture"
moved "Autoscaling" content away from Kubernetes "Getting started" page into its own user guide. I think it's too complicated for "Getting Started". No content cuts.
Cut "Viewing the dashboard" and "Ray Client" from Kubernetes "Getting started" page.
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
A new feature was recently added, where Serve replicas are not restarted if only `num_replicas`, `autoscaling_config`, and/or `user_config` is updated in the config file that's redeployed. Updating docs to talk about this feature.
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Stephanie Wang swang@cs.berkeley.edu
Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections:
Add links to the more comprehensive observability section.
Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was.
Include more specific instructions about setting up observability tools for VMs vs k8s.
This adds the structure described here, namely adding a new section under Ray Clusters which is focused on running applications on Ray clusters.
Signed-off-by: Cade Daniel <cade@anyscale.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>