hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 13:26:39 -04:00

Author	SHA1	Message	Date
Alan Guo	adedfdb0ba	Add back job_id to submit_job API to maintain backwards-compatibility (#27110 ) (#27202 ) Fix for a unintentional backwards-compatibility breakage for #25902 job submit api should still accept job_id as a parameter Signed-off-by: Alan Guo aguo@anyscale.com	2022-07-28 14:27:48 -07:00
Alan Guo	5d6bc5360d	Fix the jobs tab in the beta dashboard and fill it with data from both "submission" jobs and "driver" jobs (#25902 ) ## Why are these changes needed? - Fixes the jobs tab in the new dashboard. Previously it didn't load. - Combines the old job concept, "driver jobs" and the new job submission conception into a single concept called "jobs". Jobs tab shows information about both jobs. - Updates all job APIs: They now returns both submission jobs and driver jobs. They also contains additional data in the response including "id", "job_id", "submission_id", and "driver". They also accept either job_id or submission_id as input. - Job ID is the same as the "ray core job id" concept. It is in the form of "0100000" and is the primary id to represent jobs. - Submission ID is an ID that is generated for each ray job submission. It is in the form of "raysubmit_12345...". It is a secondary id that can be used if a client needs to provide a self-generated id. or if the job id doesn't exist (ex: if the submission job doesn't create a ray driver) This PR has 2 deprecations - The `submit_job` sdk now accepts a new kwarg `submission_id`. `job_id is deprecated. - The `ray job submit` CLI now accepts `--submission-id`. `--job-id` is deprecated. This PR has 4 backwards incompatible changes: - list_jobs sdk now returns a list instead of a dictionary - the `ray job list` CLI now prints a list instead of a dictionary - The `/api/jobs` endpoint returns a list instead of a dictionary - The `POST api/jobs` endpoint (submit job) now returns a json with `submission_id` field instead of `job_id`.	2022-07-27 02:39:52 -07:00
Alan Guo	e8222ff600	[dashboard] Update cluster_activities endpoint to use pydantic. (#26609 ) Update cluster_activities endpoint to use pydantic so we have better data validation. Make timestamp a required field. Add pydantic to ray[default] requirements	2022-07-25 10:54:22 -07:00
brucez-anyscale	f76d7b23f2	Revert "Revert "[Dashboard][Serve] Move Serve related endpoints to dashboard agent"" (#26336 )	2022-07-06 19:37:30 -07:00
Yi Cheng	12d147ff1f	Revert "[Dashboard][Serve] Move Serve related endpoints to dashboard agent (#26107 )" (#26333 ) This reverts commit `84166ccb04`.	2022-07-06 13:30:33 -07:00
brucez-anyscale	84166ccb04	[Dashboard][Serve] Move Serve related endpoints to dashboard agent (#26107 ) In Ray 2.0, we want to achieve api server HA. Originally serve endpoints are in head node. This pr moves serve endpoints to dashboard agents, so they will be HA due to multiple replica of dashboard agent.	2022-07-06 10:58:00 -07:00
Eric Liang	43aa2299e6	[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695 ) Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.	2022-06-21 15:13:29 -07:00
Archit Kulkarni	27e7c284ee	[Jobs] Change jobs start_time end_time from seconds to ms for consistency (#24123 ) In the snapshot, all timestamps are given in ms except for Jobs: ``` wget -q -O - http://127.0.0.1:8265/api/snapshot { "result":true, "msg":"hello", "data":{ "snapshot":{ "jobs":{ "01000000":{ "status":null, "statusMessage":null, "isDead":false, "startTime":1650315791249, "endTime":0, "config":{ "namespace":"_ray_internal_dashboard", "metadata":{ }, "runtimeEnv":{ } } } }, "jobSubmission":{ "raysubmit9Bsej1Rtxqqetxup":{ "status":"SUCCEEDED", "message":"Job finished successfully.", "errorType":null, "startTime":1650315925, "endTime":1650315926, "metadata":{ "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4" }, "runtimeEnv":{ "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "entrypoint":"ls" }, "raysubmitEibragqkyg16Hpcj":{ "status":"SUCCEEDED", "message":"Job finished successfully.", "errorType":null, "startTime":1650316039, "endTime":1650316041, "metadata":{ "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4" }, "runtimeEnv":{ "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "entrypoint":"echo hi" }, "raysubmitSh1U7Grdsbqrf6Je":{ "status":"SUCCEEDED", "message":"Job finished successfully.", "errorType":null, "startTime":1650316354, "endTime":1650316355, "metadata":{ "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4" }, "runtimeEnv":{ "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "entrypoint":"echo hi" } }, "actors":{ "8c8e28e642ba2cfd0457d45e01000000":{ "jobId":"01000000", "state":"DEAD", "name":"_ray_internal_job_actor_raysubmit_9BSeJ1rTXQqEtXuP", "namespace":"_ray_internal_dashboard", "runtimeEnv":{ "uris":{ "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "startTime":1650315926620, "endTime":1650315927499, "isDetached":true, "resources":{ "node:172.31.73.39":0.001 }, "actorClass":"JobSupervisor", "currentWorkerId":"9628b5eb54e98353601413845fbca0a8c4e5379d1469ce95f3dfbace", "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7", "ipAddress":"172.31.73.39", "port":10003, "metadata":{ } }, "a7fd8354567129910c44298401000000":{ "jobId":"01000000", "state":"DEAD", "name":"_ray_internal_job_actor_raysubmit_sh1u7grDsBQRf6je", "namespace":"_ray_internal_dashboard", "runtimeEnv":{ "uris":{ "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "startTime":1650316355718, "endTime":1650316356620, "isDetached":true, "resources":{ "node:172.31.73.39":0.001 }, "actorClass":"JobSupervisor", "currentWorkerId":"f07fd7a393898bf7d9027a5de0b0f566bb64ae80c0fcbcc107185505", "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7", "ipAddress":"172.31.73.39", "port":10005, "metadata":{ } }, "19ca9ad190f47bae963592d601000000":{ "jobId":"01000000", "state":"DEAD", "name":"_ray_internal_job_actor_raysubmit_eibRAGqKyG16HpCj", "namespace":"_ray_internal_dashboard", "runtimeEnv":{ "uris":{ "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip" }, "startTime":1650316041089, "endTime":1650316041978, "isDetached":true, "resources":{ "node:172.31.73.39":0.001 }, "actorClass":"JobSupervisor", "currentWorkerId":"50b8e7e9a6981fe0270afd7f6387bc93788356822c9a664c2988f5ba", "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7", "ipAddress":"172.31.73.39", "port":10004, "metadata":{ } } }, "deployments":{ }, "sessionName":"session_2022-04-18_13-49-44_814862_139", "rayVersion":"1.12.0", "rayCommit":"f18fc31c7562990955556899090f8e8656b48d2d" } } } ``` This PR fixes the inconsistency by changing Jobs start/end timestamps to ms.	2022-04-26 08:37:41 -07:00
Archit Kulkarni	76bb5396c7	[Doc] [jobs] Add links to Job Submission and improve doc (#23209 ) - Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested. - Adds docstrings for the Jobs SDK, which automatically show up in the API reference - Improve the Job Submission main page - Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-18 12:52:13 -05:00
Archit Kulkarni	77090144a2	[jobs] Add `entrypoint` field to JobInfo (#23253 )	2022-03-16 22:02:22 -05:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
Edward Oakes	58e5f0140d	[jobs] Rename JobData -> JobInfo (#22499 ) `JobData` could be confused with the actual output data of a job, `JobInfo` makes it more clear that this is status information + metadata.	2022-02-22 16:18:16 -06:00
Archit Kulkarni	df581c584a	[Job] [Dashboard] Add Job Submission data to cluster snapshot (#22225 ) The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection). In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command. As such a Job can have zero or multiple Ray drivers. This means we should add a new snapshot entry corresponding to new jobs. We'll leave the old snapshot in place for legacy jobs. - Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID. It wasn't working before. - This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot. For backwards compatibility, the `status` and `message` fields are preserved.	2022-02-18 09:54:37 -06:00
Edward Oakes	8806b2d5c4	[jobs] Monitor jobs in the background to avoid requiring clients to poll (#22180 )	2022-02-07 15:25:25 -06:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Jiao	ed34434131	[Jobs] Add log streaming for jobs (#20976 ) Current logs API simply returns a str to unblock development and integration. We should add proper log streaming for better UX and external job manager integration. Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com> Co-authored-by: Ed Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com> Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-12-14 17:01:53 -08:00
Edward Oakes	39b2c3927c	[jobs] Add /api/version endpoint (#20622 )	2021-11-22 15:11:04 -06:00
Edward Oakes	d26c9e67e8	[job submission] Add a `message` to the JobStatus to return more detailed errors (#20491 )	2021-11-18 10:15:23 -06:00
Edward Oakes	eae523159f	[job submission] Prefix job ID with `raysubmit_` and pass `job_name` metadata (#20490 )	2021-11-17 21:48:22 -06:00
Edward Oakes	48bc1af2da	[job submission] Remove DOES_NOT_EXIST status (#20354 )	2021-11-15 16:57:32 -08:00
Edward Oakes	6c3bad52b6	[job submission] Better validation + tests for input types, refactor API (#20332 )	2021-11-13 22:54:01 -08:00
Yi Cheng	e54d3117a4	[gcs] Update all redis kv usage in python except function table (#20014 ) ## Why are these changes needed? This is part of redis removal project. In this PR all direct usage of redis got removed except function table. Function table will be migrated in the next PR ## Related issue number #19443	2021-11-10 20:24:53 -08:00
Edward Oakes	81f036d078	[job submission] Move job_manager to dashboard module, common parts to common.py (#20209 )	2021-11-10 14:14:55 -08:00

23 commits