# Why are these changes needed?
The dashboard can display the message <actor> cannot be created because the Ray cluster cannot satisfy its resource requirements in the case where the runtime env setup is stalled. This PR updates this message to include the possibility of the runtime env setup failing.
This PR adds a tip to the Job Submission doc saying that if a job is stalled in PENDING, the runtime env setup may have stalled. It adds a pointer to the log files which should have more information.
The runtime env cannot stall forever, it fails after 10 minutes. This is a new feature added after the Ray 1.13 branch cut. In Ray <= 1.13, the runtime env can still stall forever.
# Related issue number
Closes#26332
- Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested.
- Adds docstrings for the Jobs SDK, which automatically show up in the API reference
- Improve the Job Submission main page
- Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Adds an API to the REST server, the SDK, and the CLI for listing all jobs that have been submitted, along with their information.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection).
In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command. As such a Job can have zero or multiple Ray drivers. This means we should add a new snapshot entry corresponding to new jobs. We'll leave the old snapshot in place for legacy jobs.
- Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID. It wasn't working before.
- This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot. For backwards compatibility, the `status` and `message` fields are preserved.
To use Jobs on a remote cluster, you need to set up port forwarding. When using the cluster launcher, the `ray dashboard` command provides this automatically. This PR adds a how-to to the docs for this feature.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Previously it wasn't obvious which working_dir option was recommended, and the size limit for local working_dir didn't appear on the Jobs page. (The user would have had to go to the runtime_env API reference to see the size limit.). This PR makes this information more prominent.
For public SDK APIs, change the import path from
```python
from ray.dashboard.modules.job.common import JobStatus, JobStatusInfo
from ray.dashboard.modules.job.sdk import JobSubmissionClient
```
to
```python
from ray.job_submission import JobStatus, JobSubmissionClient
```
`JobStatus`, `JobStatusInfo` and `JobSubmissionClient` were the only names referenced in the SDK doc so far, but we can add more later as they appear.
This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way:
- [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign.
- [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).
2022-01-21 15:42:05 -08:00
Renamed from doc/source/ray-job-submission/overview.rst (Browse further)