Commit graph

11 commits

Author SHA1 Message Date
shrekris-anyscale
010a3566e6
[Serve] Allow and remove trailing slashes in Ray submission address (#26093) 2022-06-30 16:04:53 -07:00
Archit Kulkarni
1b67e6a8ae
[Jobs] [Dashboard] Add job submission id as field to job snapshot (#24303)
Closes https://github.com/ray-project/ray/issues/24300

Adds a field to the job submission snapshot that matches the job name in the existing snapshot.  Before this PR, the job submission name was camelcased because all snapshot keys are automatically camelcased.  This PR allows jobs from the old job field to be linked to ones in the new job submission snapshot.
2022-04-29 10:10:24 -05:00
Archit Kulkarni
27e7c284ee
[Jobs] Change jobs start_time end_time from seconds to ms for consistency (#24123)
In the snapshot, all timestamps are given in ms except for Jobs:

```
wget -q -O - http://127.0.0.1:8265/api/snapshot

{
   "result":true,
   "msg":"hello",
   "data":{
      "snapshot":{
         "jobs":{
            "01000000":{
               "status":null,
               "statusMessage":null,
               "isDead":false,
               "startTime":1650315791249,
               "endTime":0,
               "config":{
                  "namespace":"_ray_internal_dashboard",
                  "metadata":{
                     
                  },
                  "runtimeEnv":{
                     
                  }
               }
            }
         },
         "jobSubmission":{
            "raysubmit9Bsej1Rtxqqetxup":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650315925,
               "endTime":1650315926,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"ls"
            },
            "raysubmitEibragqkyg16Hpcj":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650316039,
               "endTime":1650316041,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"echo hi"
            },
            "raysubmitSh1U7Grdsbqrf6Je":{
               "status":"SUCCEEDED",
               "message":"Job finished successfully.",
               "errorType":null,
               "startTime":1650316354,
               "endTime":1650316355,
               "metadata":{
                  "creatorId":"usr_f6tgCaaFBJC6tZz1ZVzzAVf4"
               },
               "runtimeEnv":{
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "entrypoint":"echo hi"
            }
         },
         "actors":{
            "8c8e28e642ba2cfd0457d45e01000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_9BSeJ1rTXQqEtXuP",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650315926620,
               "endTime":1650315927499,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"9628b5eb54e98353601413845fbca0a8c4e5379d1469ce95f3dfbace",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10003,
               "metadata":{
                  
               }
            },
            "a7fd8354567129910c44298401000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_sh1u7grDsBQRf6je",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650316355718,
               "endTime":1650316356620,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"f07fd7a393898bf7d9027a5de0b0f566bb64ae80c0fcbcc107185505",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10005,
               "metadata":{
                  
               }
            },
            "19ca9ad190f47bae963592d601000000":{
               "jobId":"01000000",
               "state":"DEAD",
               "name":"_ray_internal_job_actor_raysubmit_eibRAGqKyG16HpCj",
               "namespace":"_ray_internal_dashboard",
               "runtimeEnv":{
                  "uris":{
                     "workingDirUri":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
                  },
                  "workingDir":"gcs://_ray_pkg_6068c19fb3b8530f.zip"
               },
               "startTime":1650316041089,
               "endTime":1650316041978,
               "isDetached":true,
               "resources":{
                  "node:172.31.73.39":0.001
               },
               "actorClass":"JobSupervisor",
               "currentWorkerId":"50b8e7e9a6981fe0270afd7f6387bc93788356822c9a664c2988f5ba",
               "currentRayletId":"61ab3958258c82266b222f4691a53e71b6315e312408a21cb3350bc7",
               "ipAddress":"172.31.73.39",
               "port":10004,
               "metadata":{
                  
               }
            }
         },
         "deployments":{
            
         },
         "sessionName":"session_2022-04-18_13-49-44_814862_139",
         "rayVersion":"1.12.0",
         "rayCommit":"f18fc31c7562990955556899090f8e8656b48d2d"
      }
   }
}
```

 This PR fixes the inconsistency by changing Jobs start/end timestamps to ms.
2022-04-26 08:37:41 -07:00
Archit Kulkarni
77090144a2
[jobs] Add entrypoint field to JobInfo (#23253) 2022-03-16 22:02:22 -05:00
shrekris-anyscale
8548affdc2
Increase test_failed_job_status timeout in test_job_submission (#22643)
`test_job_submission` has become [flakey](https://flakey-tests.ray.io/) due to timeout. This change increases the timeout in `test_failed_job_status` from 10 to 25 seconds.
2022-02-25 10:08:55 -08:00
Archit Kulkarni
df581c584a
[Job] [Dashboard] Add Job Submission data to cluster snapshot (#22225)
The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection).  

In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command.  As such a Job can have zero or multiple Ray drivers.  This means we should add a new snapshot entry corresponding to new jobs.  We'll leave the old snapshot in place for legacy jobs.

- Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID.  It wasn't working before.

- This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot.  For backwards compatibility, the `status` and `message` fields are preserved.
2022-02-18 09:54:37 -06:00
Archit Kulkarni
50e2bef9d0
[Jobs] Hide dashboard from Job Submission import path (#22223)
For public SDK APIs, change the import path from 

```python
from ray.dashboard.modules.job.common import JobStatus, JobStatusInfo
from ray.dashboard.modules.job.sdk import JobSubmissionClient
```

to 
```python
from ray.job_submission import JobStatus, JobSubmissionClient
```

`JobStatus`, `JobStatusInfo` and `JobSubmissionClient` were the only names referenced in the SDK doc so far, but we can add more later as they appear.
2022-02-09 13:55:32 -06:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
mwtian
e8ce01c525
[Dashboard] offload blocking work to a thread pool (#21762)
Currently, GCS KV client only has blocking API. Calling them from dashboard event loop can block other operations for many seconds, leading to failures such as taking too long (> 2min) to submit a job and making nightly tests fail (#21699). This PR offloads the blocking work to a separate thread. Implementing async GCS KV API will be done in the future.
2022-01-21 17:55:11 -08:00
Edward Oakes
d26c9e67e8
[job submission] Add a message to the JobStatus to return more detailed errors (#20491) 2021-11-18 10:15:23 -06:00
Edward Oakes
b2ddea255d
[job submission] Add job submission ID + status to /api/snapshot (#19994) 2021-11-03 09:49:28 -05:00