ray/dashboard/modules
Archit Kulkarni a67c8a0739
[runtime_env] Add temporary URI reference to prevent URI deletion before job starts (#24719)
Packages are uploaded to the GCS for `runtime_env`.  These packages are garbage collected when their refcount becomes zero.

The problem is the reference doesn't get incremented until the job starts, which happens after the package is uploaded.  It's possible for the package's refcount to go to zero in between the upload and when the job starts, causing the package to be deleted before it's needed by the job.  It's likely the cause of https://github.com/ray-project/ray/issues/23423.

We can't just increment the refcount at the time of upload, because if the script is killed before the job is started (e.g. via Ctrl-C) then the reference will never be decremented and the package will never be deleted.

The solution in this PR is to increment the refcount at the time of upload, but automatically decrement after a configurable timeout (default 30s).  This should be enough time for the job to start.  When the job starts, it increments the refcount as usual and decrements it when the job finishes or is killed.

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-05-23 10:25:04 -05:00
..
actor [Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
event [Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
job [runtime_env] Add temporary URI reference to prevent URI deletion before job starts (#24719) 2022-05-23 10:25:04 -05:00
log [Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
node [Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
reporter [Core] Allow accepting gRPC HTTP proxy via env variable (#23526) 2022-05-10 11:30:46 +08:00
runtime_env [runtime env] [java] Support jars in runtime env for Java (#24170) 2022-05-12 09:34:40 +08:00
serve [Serve] Move deployment clean up under serve.run() api (#24306) 2022-05-02 12:10:11 -05:00
snapshot [Jobs] [Dashboard] Add job submission id as field to job snapshot (#24303) 2022-04-29 10:10:24 -05:00
state [State API] List runtime env API (#24126) 2022-05-02 14:01:00 -07:00
test [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
tests [serve] Reject Ray client addresses when submitting via Dashboard (#23339) 2022-03-21 11:17:51 -05:00
tune [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
usage_stats Don't show usage stats prompt in dashboard if prompt is disabled (#24700) 2022-05-12 07:55:28 -07:00
__init__.py [Dashboard] New dashboard skeleton (#9099) 2020-07-27 11:34:47 +08:00
dashboard_sdk.py [job submission] Fix address defaulting behavior (#24970) 2022-05-20 14:10:36 -05:00