hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Archit Kulkarni	084f06f49a	[Doc] [Job submission] [Dashboard] Add tip for long runtime_env installation and improve error (#26911 ) # Why are these changes needed? The dashboard can display the message <actor> cannot be created because the Ray cluster cannot satisfy its resource requirements in the case where the runtime env setup is stalled. This PR updates this message to include the possibility of the runtime env setup failing. This PR adds a tip to the Job Submission doc saying that if a job is stalled in PENDING, the runtime env setup may have stalled. It adds a pointer to the log files which should have more information. The runtime env cannot stall forever, it fails after 10 minutes. This is a new feature added after the Ray 1.13 branch cut. In Ray <= 1.13, the runtime env can still stall forever. # Related issue number Closes #26332	2022-07-25 23:32:27 -07:00
Stephanie Wang	55a0f7bb2d	[core] ray.init defaults to an existing Ray instance if there is one (#26678 ) ray.init() will currently start a new Ray instance even if one is already existing, which is very confusing if you are a new user trying to go from local development to a cluster. This PR changes it so that, when no address is specified, we first try to find an existing Ray cluster that was created through `ray start`. If none is found, we will start a new one. This makes two changes to the ray.init() resolution order: 1. When `ray start` is called, the started cluster address was already written to a file called `/tmp/ray/ray_current_cluster`. For ray.init() and ray.init(address="auto"), we will first check this local file for an existing cluster address. The file is deleted on `ray stop`. If the file is empty, autodetect any running cluster (legacy behavior) if address="auto", or we will start a new local Ray instance if address=None. 2. When ray.init(address="local") is called, we will create a new local Ray instance, even if one is already existing. This behavior seems to be necessary mainly for `ray.client` use cases. This also surfaces the logs about which Ray instance we are connecting to. Previously these were hidden because we didn't set up the log until after connecting to Ray. So now Ray will log one of the following messages during ray.init: ``` (Connecting to existing Ray cluster at address: <IP>...) ...connection... (Started a local Ray cluster.\| Connected to Ray Cluster.)( View the dashboard at <URL>) ``` Note that this changes the dashboard URL to be printed with `ray.init()` instead of when the dashboard is first started. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-07-23 11:27:22 -07:00
Dmitri Gekhtman	fdd5c53bfd	[KubeRay] Documentation structure and skeleton (#26589 ) Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.	2022-07-19 13:28:04 -07:00
Jiajun Yao	60dd77a2d3	Enable usage stats collection for ray.init iff nightly wheels (#26461 ) For nightly wheels, we want to collect usage stats for local clusters started via ray.init() as well.	2022-07-14 12:14:01 -07:00
Dmitri Gekhtman	3af3269b8e	[KubeRay][docs] Warning about kubectl apply, update feature state wording. (#25685 ) This PR Adds a warning about a known issue to the KubeRay section of the Ray docs. Updates the description of the feature state of KubeRay integration. Adds some links to the KubeRay docs.	2022-06-27 14:11:00 -07:00
clarng	ef866d1e49	exclude doc_code from import sorting (#25772 ) Skip sorting the imports in doc_code.	2022-06-15 11:34:45 -07:00
Dmitri Gekhtman	e745cd0e7b	[Docs] Note that certain features are community maintained (#25687 ) Adds notes explaining that Ray's support on Azure, Aliyun, and SLURM is community-maintained. Rephrases the mention of K8s support in the intro. This PR replaces https://github.com/ray-project/ray/pull/25504.	2022-06-13 16:10:32 -07:00
Dmitri Gekhtman	836b08597f	[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator (#25386 ) This PR incorporates recent autoscaler config changes from KubeRay.	2022-06-08 20:09:43 -07:00
Archit Kulkarni	3296345557	Add warning about entrpoint command in quotes (#25519 )	2022-06-08 09:38:55 -07:00
G Goswami	7ddc23a8f5	Fixing example (#25524 ) Remove quotes from K8s job submission example in docs.	2022-06-06 18:21:19 -04:00
Zhe Zhang	2d74ecc2ec	[Docs] [Clusters] Fix issues in the overview part of Cluster Deployment Guide, and fix a typo (#25473 ) * Fix issues in the overview part, and fix a typo * Addressing comment Co-authored-by: Alex Wu <alex@anyscale.com>	2022-06-06 14:11:41 -07:00
Zhe Zhang	4cc202585a	[Docs] Document Ray downscaling behavior (#25466 )	2022-06-03 17:08:21 -07:00
Eric Liang	c1b2ad112e	Comment our banner (#25369 )	2022-06-01 16:36:33 -07:00
Zhe Zhang	52774e8460	Use bold font consistently on landing page (#25318 )	2022-06-01 11:44:46 -04:00
javi-redondo	a8fc0c5015	Add landing & key concepts pages for clusters (#24379 ) Add landing & key concepts pages for clusters	2022-05-25 10:23:50 -07:00
Simon Mo	c3ac6fcf3f	Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894 )	2022-05-17 19:31:05 -07:00
Jiajun Yao	1daad65568	[Doc] Add doc for usage stats collection (#24522 )	2022-05-10 17:18:49 -07:00
Dmitri Gekhtman	d68c1ecaf9	[kuberay] Test Ray client and update autoscaler image (#24195 ) This PR adds KubeRay e2e testing for Ray client and updates the suggested autoscaler image to one running the merge commit of PR #23883 .	2022-04-27 18:02:12 -07:00
Chen Shen	1d981e0cf1	[doc] fix /cluster/config.html #23720 closes #23560	2022-04-22 10:13:12 -07:00
Dmitri Gekhtman	8c5fe44542	[KubeRay] Fix autoscaling with GPUs and custom resources, with e2e tests (#23883 ) - Closes #23874 by fixing a typo ("num_gpus" -> "num-gpus"). - Adds end-to-end test logic confirming the fix. - Adds end-to-end test logic confirming autoscaling with custom resources works. - Slightly refines developer instructions. - Deflakes test logic a bit by allowing for the event that the head pod changes its identity as the Ray cluster starts up.	2022-04-21 14:54:37 -07:00
Zyiqin-Miranda	e4a66c0e2e	[doc] Add CloudWatch integration documentation (#22638 ) This PR adds documentation for Ray CloudWatch integration.	2022-04-21 09:44:41 -07:00
Philipp Moritz	886cc4d674	Fix broken links in documentation and put linkcheck linter in place on CI (#23340 )	2022-03-18 21:02:52 -07:00
Archit Kulkarni	76bb5396c7	[Doc] [jobs] Add links to Job Submission and improve doc (#23209 ) - Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested. - Adds docstrings for the Jobs SDK, which automatically show up in the API reference - Improve the Job Submission main page - Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-18 12:52:13 -05:00
Jiaxin Shan	158ff3394f	[Job submission] Improve job submission docs (#23115 ) I am following job submission docs here https://docs.ray.io/en/latest/cluster/job-submission.html and run some examples. I notice there're few minor issues. 1. some required libraries are not imported in any code snippets 2. Get job api returns `{'status': 'SUCCEEDED'}` instead of `job_status` so code snippet here doesn't work https://docs.ray.io/en/latest/cluster/job-submission.html#rest-api	2022-03-15 21:20:33 -05:00
Max Pumperla	ad30123339	[docs] fix includes for md files (#23180 ) the include of content for md files like our central getting started page didn't render. fixed here. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-15 11:09:18 +00:00
Pamphile Roy	81b17669a4	[core][docs] Document port/IP binding and slurm concerns (#22663 ) Using Ray on SLURM system is documented but missing some pitfalls about network. This PR adds some information about port binding and address binding (I will open a feature request with more and link it here later). I did not put any real recommendation on this last point since `--address` did not work. I had cannot resolve issue after setting an internal IP although it's reachable.	2022-03-15 01:43:46 -07:00
Scott Graham	f673acb0ad	Scgraham/azure docs (#22296 ) Fixes potential error if function not found in azure sdk when deploying ray cluster on azure Adds additional python package needed to deploy ray cluster on azure in docs Co-authored-by: Scott Graham <scgraham@microsoft.com>	2022-03-13 18:08:08 -07:00
Archit Kulkarni	52a722ffe7	[jobs] Make local pip/conda requirements files work with jobs (#22849 )	2022-03-10 15:15:16 -06:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
Dmitri Gekhtman	413fe08f87	Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847 ) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR.	2022-03-09 18:26:57 -08:00
Max Pumperla	b609bdf898	[docs] Improve connection between library references and their APIs (#22800 ) Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-04 16:48:03 +01:00
Archit Kulkarni	1752f17c6d	[Job submission] Add `list_jobs` API (#22679 ) Adds an API to the REST server, the SDK, and the CLI for listing all jobs that have been submitted, along with their information. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-01 21:27:09 -06:00
Jiaxin Shan	32829ff9ad	[KubeRay] Provide a new Dockerfile for fast build (#22689 ) Adds a new Dockerfile for fast build and development of KubeRay.	2022-02-28 17:09:16 -08:00
Archit Kulkarni	85657b1377	[Doc] [Jobs] add CLI and SDK reference to docs (#22680 )	2022-02-28 17:57:46 -06:00
Archit Kulkarni	87f7bfe4cd	[doc] [job submission] Add k8s instructions and a comment about ports (#22598 )	2022-02-23 16:32:37 -06:00
mwtian	9a157dfe82	[GCS-Ray] update doc and error message for GCS-Ray (#22528 ) Update documentation to reflect that Ray no longer starts Redis by default.	2022-02-22 17:56:30 -08:00
Dmitri Gekhtman	a402e956a4	[KubeRay] Format autoscaling config based on RayCluster CR (#22348 ) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config.	2022-02-22 11:06:37 -08:00
Archit Kulkarni	df581c584a	[Job] [Dashboard] Add Job Submission data to cluster snapshot (#22225 ) The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection). In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command. As such a Job can have zero or multiple Ray drivers. This means we should add a new snapshot entry corresponding to new jobs. We'll leave the old snapshot in place for legacy jobs. - Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID. It wasn't working before. - This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot. For backwards compatibility, the `status` and `message` fields are preserved.	2022-02-18 09:54:37 -06:00
Alex Wu	276ff2b7ed	[docs][autoscaler] Add maintainers for node providers (#22237 ) This PR adds documentation for the maintainers of the various node providers. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-02-12 11:31:32 -08:00
Archit Kulkarni	a65f35b867	[Doc] [Jobs] Add `ray dashboard` docs to jobs doc (#22222 ) To use Jobs on a remote cluster, you need to set up port forwarding. When using the cluster launcher, the `ray dashboard` command provides this automatically. This PR adds a how-to to the docs for this feature. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-02-11 11:01:37 -06:00
Archit Kulkarni	54b2e143e4	[Doc] [Jobs] Add size limit and recommendations for working_dir (#22219 ) Previously it wasn't obvious which working_dir option was recommended, and the size limit for local working_dir didn't appear on the Jobs page. (The user would have had to go to the runtime_env API reference to see the size limit.). This PR makes this information more prominent.	2022-02-09 13:56:02 -06:00
Archit Kulkarni	50e2bef9d0	[Jobs] Hide `dashboard` from Job Submission import path (#22223 ) For public SDK APIs, change the import path from ```python from ray.dashboard.modules.job.common import JobStatus, JobStatusInfo from ray.dashboard.modules.job.sdk import JobSubmissionClient ``` to ```python from ray.job_submission import JobStatus, JobSubmissionClient ``` `JobStatus`, `JobStatusInfo` and `JobSubmissionClient` were the only names referenced in the SDK doc so far, but we can add more later as they appear.	2022-02-09 13:55:32 -06:00
Alex Wu	c9a419ac76	[Autoscaler] Remove staroid node provider (#22236 ) The Staroid node provider has been abandoned and unmaintained for quite some time now. Due to the fact that there are no active maintainers, the original contributors cannot be reached, and there is no clear interest, we are no longer officially endorsing or supporting the node provider. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-02-09 09:18:18 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Max Pumperla	b34099e764	[docs] landing page (fixes #21750 ) (#21859 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-01-26 17:14:25 -08:00
isaac-vidas	236fe58259	[Doc] Update requests calls to ray job submission api (#21802 )	2022-01-24 17:44:31 -08:00
Max Pumperla	7953c9ca57	[docs] integrate algolia docsearch, move to sphinx panels (#21814 )	2022-01-24 17:00:41 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00

1 2 3 4

157 commits