Commit graph

44 commits

Author SHA1 Message Date
Alan Guo
5d6bc5360d
Fix the jobs tab in the beta dashboard and fill it with data from both "submission" jobs and "driver" jobs (#25902)
## Why are these changes needed?
- Fixes the jobs tab in the new dashboard. Previously it didn't load.
- Combines the old job concept, "driver jobs" and the new job submission conception into a single concept called "jobs". Jobs tab shows information about both jobs.

- Updates all job APIs: They now returns both submission jobs and driver jobs. They also contains additional data in the response including "id", "job_id", "submission_id", and "driver". They also accept either job_id or submission_id as input.

- Job ID is the same as the "ray core job id" concept. It is in the form of "0100000" and is the primary id to represent jobs.
- Submission ID is an ID that is generated for each ray job submission. It is in the form of "raysubmit_12345...". It is a secondary id that can be used if a client needs to provide a self-generated id. or if the job id doesn't exist (ex: if the submission job doesn't create a ray driver)

This PR has 2 deprecations
- The `submit_job` sdk now accepts a new kwarg `submission_id`. `job_id is deprecated.
- The `ray job submit` CLI now accepts `--submission-id`. `--job-id` is deprecated.

**This PR has 4 backwards incompatible changes:**
- list_jobs sdk now returns a list instead of a dictionary
- the `ray job list` CLI now prints a list instead of a dictionary
- The `/api/jobs` endpoint returns a list instead of a dictionary
- The `POST api/jobs` endpoint (submit job) now returns a json with `submission_id` field instead of `job_id`.
2022-07-27 02:39:52 -07:00
brucez-anyscale
f76d7b23f2
Revert "Revert "[Dashboard][Serve] Move Serve related endpoints to dashboard agent"" (#26336) 2022-07-06 19:37:30 -07:00
Yi Cheng
12d147ff1f
Revert "[Dashboard][Serve] Move Serve related endpoints to dashboard agent (#26107)" (#26333)
This reverts commit 84166ccb04.
2022-07-06 13:30:33 -07:00
brucez-anyscale
84166ccb04
[Dashboard][Serve] Move Serve related endpoints to dashboard agent (#26107)
In Ray 2.0, we want to achieve api server HA.
Originally serve endpoints are in head node.
This pr moves serve endpoints to dashboard agents, so they will be HA due to multiple replica of dashboard agent.
2022-07-06 10:58:00 -07:00
Eric Liang
43aa2299e6
[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695)
Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.
2022-06-21 15:13:29 -07:00
Archit Kulkarni
6d2806f951
[Jobs] [Test] Add integration tests to cover runtime_env inheritance with working_dir and with Tune (#25562)
The current inheritance behavior for runtime_envs enables the following workflow for Jobs:  A working_dir can be set in the Jobs API, and then inside the driver script, if a new per-task runtime_env is defined, it will automatically inherit the driver's working_dir.

There is an ongoing discussion about the best approach for runtime_env inheritance going forward: https://github.com/ray-project/ray/issues/25484, in which we noted that there were no tests covering this behavior.

This PR adds integration tests for the above behavior. If we ultimately decide to abandon the current inheritance behavior and instead have child runtime envs completely overwrite the parent runtime env, this test will fail, reminding us to do the following:

- Update the internal runtime_env usage in Ray Tune to use the `ray.get_runtime_context().runtime_env.update` API
- Update the documentation for Ray Jobs telling users to use `ray.get_runtime_context().runtime_env.update` and update this test
2022-06-08 13:54:06 -07:00
Edward Oakes
cb7bcbd651
[job submission] Fix address defaulting behavior (#24970)
Per the discussion in https://github.com/ray-project/ray/issues/24858:

- If an address without a port is provided, don't append a port.
- Default to `http://localhost:8265` if nothing is provided.
2022-05-20 14:10:36 -05:00
Philipp Moritz
27917f570d
[runtime_env] Extend runtime_env hook to also cover jobs (#24328)
This extends https://github.com/ray-project/ray/pull/24036 to also cover job submission.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-04-30 09:15:51 -07:00
shrekris-anyscale
75b7465ba4
[serve] Reject Ray client addresses when submitting via Dashboard (#23339)
Some commands in the Serve CLI use Ray client and some commands ping the Ray dashboard; however, all commands read `RAY_ADDRESS` to get the address. This change raises a nice exception if the user accidentally passes a Ray client address as the Ray Dashboard address.
2022-03-21 11:17:51 -05:00
Archit Kulkarni
77090144a2
[jobs] Add entrypoint field to JobInfo (#23253) 2022-03-16 22:02:22 -05:00
Archit Kulkarni
8707eb6288
[runtime env] Support .whl files in py_modules (#22368)
The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster.  One gap in this is if the local Python module is in the form of a wheel (`.whl` file.)  This PR adds the missing support for uploading and installing the `.whl` file.
2022-03-16 16:37:10 -05:00
Jialing He
39a6c054d3
[runtime env][feature] introduce pip_check_enable and pip_version (#22826) 2022-03-14 23:41:19 +08:00
Archit Kulkarni
52a722ffe7
[jobs] Make local pip/conda requirements files work with jobs (#22849) 2022-03-10 15:15:16 -06:00
Archit Kulkarni
c78bd809ce
[job submission] Support local py_modules in jobs (#22843) 2022-03-10 11:42:25 -06:00
shrekris-anyscale
bc82e2d5c4
[serve] Restore "[serve] Support working_dir in serve run (#22760)" (#22971) 2022-03-09 21:31:23 -08:00
Kai Fricke
15601ed79b
Revert "[serve] Support working_dir in serve run (#22760)" (#22956)
This reverts commit ab2741d64b.

The PR breaks ray job submission for anyscale:// URLs
2022-03-09 17:04:46 +00:00
shrekris-anyscale
ab2741d64b
[serve] Support working_dir in serve run (#22760)
#22714 added `serve run` to the Serve CLI. This change allows the user to specify a local or remote `working_dir` in `serve run`.
2022-03-08 13:18:41 -06:00
Archit Kulkarni
1752f17c6d
[Job submission] Add list_jobs API (#22679)
Adds an API to the REST server, the SDK, and the CLI for listing all jobs that have been submitted, along with their information.

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-03-01 21:27:09 -06:00
Edward Oakes
58e5f0140d
[jobs] Rename JobData -> JobInfo (#22499)
`JobData` could be confused with the actual output data of a job, `JobInfo` makes it more clear that this is status information + metadata.
2022-02-22 16:18:16 -06:00
Archit Kulkarni
df581c584a
[Job] [Dashboard] Add Job Submission data to cluster snapshot (#22225)
The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection).  

In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command.  As such a Job can have zero or multiple Ray drivers.  This means we should add a new snapshot entry corresponding to new jobs.  We'll leave the old snapshot in place for legacy jobs.

- Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID.  It wasn't working before.

- This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot.  For backwards compatibility, the `status` and `message` fields are preserved.
2022-02-18 09:54:37 -06:00
Archit Kulkarni
50e2bef9d0
[Jobs] Hide dashboard from Job Submission import path (#22223)
For public SDK APIs, change the import path from 

```python
from ray.dashboard.modules.job.common import JobStatus, JobStatusInfo
from ray.dashboard.modules.job.sdk import JobSubmissionClient
```

to 
```python
from ray.job_submission import JobStatus, JobSubmissionClient
```

`JobStatus`, `JobStatusInfo` and `JobSubmissionClient` were the only names referenced in the SDK doc so far, but we can add more later as they appear.
2022-02-09 13:55:32 -06:00
SangBin Cho
d7fc7d2e9d
[Runtime Env] Plumbing runtime env failure error message to the exception: Task [1/3] (#22032)
This is the PR to write better runtime env exception. After 3 PRs are merged, we can entirely turn off the runtime env logs streamed to drivers.

The first PR only handles tasks exception.

TODO
- [x] Task (this PR)
- [ ] Actor
- [ ] Turn of runtime env logs & improve error msgs
2022-02-03 16:47:04 -08:00
Edward Oakes
e85bbfb338
[jobs] Enable default port in http:// addresses (#22014)
Closes https://github.com/ray-project/ray/issues/22012
2022-02-02 14:34:34 -06:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Jiao
ed34434131
[Jobs] Add log streaming for jobs (#20976)
Current logs API simply returns a str to unblock development and integration. We should add proper log streaming for better UX and external job manager integration.

Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
Co-authored-by: Ed Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-12-14 17:01:53 -08:00
Jiao
1e67bdfcec
[jobs] Add headers field to JobSubmissionClient and apply to all requests (#20663) 2021-12-03 18:44:30 -06:00
Edward Oakes
39b2c3927c
[jobs] Add /api/version endpoint (#20622) 2021-11-22 15:11:04 -06:00
Edward Oakes
d26c9e67e8
[job submission] Add a message to the JobStatus to return more detailed errors (#20491) 2021-11-18 10:15:23 -06:00
Edward Oakes
eae523159f
[job submission] Prefix job ID with raysubmit_ and pass job_name metadata (#20490) 2021-11-17 21:48:22 -06:00
shrekris-anyscale
c0aeb4a236
[runtime_env] Support working_dir and py_modules from HTTPS and Google Cloud Storage (#20280) 2021-11-14 02:16:45 -08:00
Edward Oakes
6c3bad52b6
[job submission] Better validation + tests for input types, refactor API (#20332) 2021-11-13 22:54:01 -08:00
Edward Oakes
07add6f7f2
Revert "Revert "[job submission] Use ray.init format addresses for Jo… (#20328) 2021-11-13 16:24:02 -08:00
Eric Liang
567e955810
Revert "[job submission] Use ray.init format addresses for JobSubmissionClient (#20245)" (#20314)
This reverts commit adc15a0fb0.
2021-11-12 21:11:24 -08:00
Nikita Vemuri
adc15a0fb0
[job submission] Use ray.init format addresses for JobSubmissionClient (#20245) 2021-11-12 13:52:43 -08:00
Edward Oakes
81f036d078
[job submission] Move job_manager to dashboard module, common parts to common.py (#20209) 2021-11-10 14:14:55 -08:00
Edward Oakes
5475bb054c
[job submission] Redirect stdout + stderr to a single log file (#20208) 2021-11-09 22:34:12 -08:00
Edward Oakes
50f2cf8a74
[job submission] Allow passing job_id, return DOES_NOT_EXIST when applicable (#20164) 2021-11-08 23:10:27 -08:00
Jiao
9ef75b27ac
[Job Submission] Add stop API to http & sdk, with better status code + stacktrace (#20094) 2021-11-06 12:37:54 -05:00
Jiao
6cfb52ff1d
[job submission] Add stop API + subprocess cleanup (#19860) 2021-11-04 13:59:47 -05:00
Edward Oakes
b2ddea255d
[job submission] Add job submission ID + status to /api/snapshot (#19994) 2021-11-03 09:49:28 -05:00
Edward Oakes
f8a6cad0b7
[job submission] SDK prototype w/ dynamic working_dir uploads (#19843) 2021-11-02 16:01:54 -05:00
Edward Oakes
bf23a31017
[job submission] Always generate and return job_id (#19851) 2021-10-29 09:09:54 -05:00
Edward Oakes
42ac906313
[job submission] Support passing metadata to the JobConfig (#19845) 2021-10-28 16:40:03 -05:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00