Adds a page describing a development workflow for Serve applications.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization.
Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html
Refactor Datasets API docs for easier navigation: [Ray Datasets API](https://ray--27592.org.readthedocs.build/en/27592/data/api/api.html)
### Changes
1. Create a new Datasets API base page.
2. Split existing APIs into separate pages.
3. Split `Dataset` and `DatasetPipeline` methods into separate sections.
1. Used `autosummary` to generate overview tables at the top of each of these pages. Open to other suggestions e.g. moving the summary to the top of each section instead.
2. **Note:** Every time we add a new method we need to explicitly add it here as well.
4. Add Input/Output APIs.
1. I chose to split these primarily by data format rather than type, since it's easier to navigate, and the existing [Creating Datasets](https://docs.ray.io/en/master/data/creating-datasets.html) User Guide already does the latter.
6. Add `Block` and `DataBatch` (should we add these aliases?)
7. Remove existing `package-ref`.
Page structure changes:
Deploying a Ray Cluster on Kubernetes
Getting Started -> links to jobs
Deploying a Ray Cluster on VMs
Getting started -> links to jobs
User Guides
Autoscaling (moved more content here in favor of the Getting started page)
Running Applications on Ray Clusters
Ray Jobs
Quickstart Using the Ray Jobs CLI
Python SDK
REST API
Ray Job Submission API Reference
Ray Client
Content changes:
modified "Deploying a Ray Cluster ..." quickstart pages to briefly summarize ad-hoc command execution, then link to jobs
modified Ray Jobs example to be more incremental - start with a simple example, then show long-running script, then show example with a runtime env, instead of all of them at once
center Ray Jobs quickstart around using the CLI. Made some minor changes to the Python SDK page to match it
remove "Ray Jobs Architecture"
moved "Autoscaling" content away from Kubernetes "Getting started" page into its own user guide. I think it's too complicated for "Getting Started". No content cuts.
Cut "Viewing the dashboard" and "Ray Client" from Kubernetes "Getting started" page.
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
A new feature was recently added, where Serve replicas are not restarted if only `num_replicas`, `autoscaling_config`, and/or `user_config` is updated in the config file that's redeployed. Updating docs to talk about this feature.
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Stephanie Wang swang@cs.berkeley.edu
Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections:
Add links to the more comprehensive observability section.
Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was.
Include more specific instructions about setting up observability tools for VMs vs k8s.
This adds the structure described here, namely adding a new section under Ray Clusters which is focused on running applications on Ray clusters.
Signed-off-by: Cade Daniel <cade@anyscale.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
*This PR:
Copies the existing clusters API reference to the new structure. The reference docs are split out into Ray Clusters (common between vms and k8s) and Ray Clusters on VMs (specific to vms). Notably, there is also a reference section for k8s, but not in this PR.
Move the three job submission user guides back into a single one. Jules had suggested that we break them out into rest/sdk/cli, but that's not P0 right now.
Fix some bugs in the left navigation bar. There should be less duplication of TOC entries. I'll keep working on related fixes in a different PR.
Signed-off-by: Cade Daniel <cade@anyscale.com>
This PR is an edit pass on the Performance Tuning page after reading it with fresh eyes. None of the content was out of date so it's mostly nits and rewording some parts that were slightly confusing.
This PR
Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
The tensor extension import is a bit expensive since it will go through Arrow's and Pandas' extension type registration logic. This PR delays the tensor extension type import until Parquet reading, which is the only case in which we need to explicitly register the type.
I have confirmed that the Parquet reading in doc/source/data/doc_code/tensor.py passes with this change.