Commit graph

188 commits

Author SHA1 Message Date
Eric Liang
567e955810
Revert "[job submission] Use ray.init format addresses for JobSubmissionClient (#20245)" (#20314)
This reverts commit adc15a0fb0.
2021-11-12 21:11:24 -08:00
Nikita Vemuri
adc15a0fb0
[job submission] Use ray.init format addresses for JobSubmissionClient (#20245) 2021-11-12 13:52:43 -08:00
Edward Oakes
5ae5c1ba28
[job submission] Basic CLI prototype (#20204) 2021-11-11 15:59:13 -08:00
mwtian
0330852baf
[Core][Pubsub] Implement Python GCS publisher and subscriber (#20111)
## Why are these changes needed?
This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`.

Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work.

Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published.

GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change.

## Related issue number
2021-11-11 14:59:57 -08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
Edward Oakes
81f036d078
[job submission] Move job_manager to dashboard module, common parts to common.py (#20209) 2021-11-10 14:14:55 -08:00
Edward Oakes
5475bb054c
[job submission] Redirect stdout + stderr to a single log file (#20208) 2021-11-09 22:34:12 -08:00
Edward Oakes
50f2cf8a74
[job submission] Allow passing job_id, return DOES_NOT_EXIST when applicable (#20164) 2021-11-08 23:10:27 -08:00
Jiao
9ef75b27ac
[Job Submission] Add stop API to http & sdk, with better status code + stacktrace (#20094) 2021-11-06 12:37:54 -05:00
architkulkarni
c5175073b2
[runtime env] Add garbage collection for conda envs (#20072) 2021-11-04 23:13:34 -05:00
Edward Oakes
65161fe9b4
[job submission] Move HTTP routes to /api/jobs prefix (#19995) 2021-11-04 17:45:25 -05:00
Jiao
6cfb52ff1d
[job submission] Add stop API + subprocess cleanup (#19860) 2021-11-04 13:59:47 -05:00
architkulkarni
bcb63961d9
[runtime env] Add plugin name to internal URI format and add GC for py_modules (#20009) 2021-11-04 10:16:14 -05:00
Edward Oakes
b2ddea255d
[job submission] Add job submission ID + status to /api/snapshot (#19994) 2021-11-03 09:49:28 -05:00
Jiajun Yao
6acf276959
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19918)
* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1
2021-11-03 12:17:55 +09:00
Edward Oakes
f8a6cad0b7
[job submission] SDK prototype w/ dynamic working_dir uploads (#19843) 2021-11-02 16:01:54 -05:00
chenk008
57363995f3
[runtime env] Move container related code to runtime env (#19067) 2021-10-29 16:31:11 -07:00
Jiao
bb0ebb7903
[job submission] Temporarily make pydantic imports conditional (#19827) 2021-10-29 18:09:18 -05:00
Edward Oakes
bf23a31017
[job submission] Always generate and return job_id (#19851) 2021-10-29 09:09:54 -05:00
Edward Oakes
42ac906313
[job submission] Support passing metadata to the JobConfig (#19845) 2021-10-28 16:40:03 -05:00
Guyang Song
119318932a
remove the env config 'RAY_DASHBOARD_MODULE_EVENT' (#19629) 2021-10-28 16:51:59 +09:00
Edward Oakes
b2e12dc43b
[runtime_env] Add basic support for python modules (#19651) 2021-10-27 17:56:46 -05:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
SangBin Cho
3222d39fb8
[Dashboard] Dashboard memory improvement (#19385)
* many ppo profiling

* completed

* improve memory usage lint

* revert temporarily

* Addressed code review

* Fix a test
2021-10-19 19:34:42 -07:00
Matti Picus
f372bb07aa
Enable dashboard on Windows (#19319) 2021-10-14 14:42:22 -07:00
Guyang Song
ab55b808c5
[runtime env] move worker env to runtime env in Java (#19060) 2021-10-11 17:25:09 +08:00
chenk008
3780a73b45
[Core] Add worker resource info to runtime env (#18804) 2021-10-08 10:37:29 -07:00
Edward Oakes
1fa81673bd
[runtime_env] Clean up validation logic (#18984)
Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.
2021-10-07 14:24:41 -05:00
Simon Mo
9b2a368c8c
[Runtime Env] Implement basic runtime env plugin mechanism (#19044) 2021-10-01 17:22:54 -07:00
Edward Oakes
8e5d48d668
[runtime_env] Remove deprecated override_environment_variables and worker_env fields (#18213) 2021-09-30 18:55:24 -05:00
Edward Oakes
73b8936aa8
[runtime_env] Unify rpc::RuntimeEnv with serialized_runtime_env field (#18641) 2021-09-28 15:13:15 -05:00
architkulkarni
fbf5f5d56b
[runtime env] [Serve] Fix error when uris field is None (#18874) 2021-09-24 14:07:17 -05:00
Qing Wang
6f1d3f94db
Publish actor state PENDING_CREATION for dashboard showing. (#18666) 2021-09-18 15:44:58 +08:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Tanmay Chordia
bf1176311f
[dashboard] add an endpoint to force kill an actor (#18508) 2021-09-13 20:03:15 -07:00
Edward Oakes
111a31d6a1
[runtime_env] Make Ray client server setup go through the runtime_env agent (#18478) 2021-09-13 14:16:35 -05:00
Edward Oakes
c482779da2
[runtime_env] Improve file-not-found msg in deletion (#18496) 2021-09-13 11:32:22 -05:00
Edward Oakes
2fcfea10b3
[runtime_env] Move URI deletion logic to the agent, remove util worker code (#18471) 2021-09-10 00:13:32 -07:00
Edward Oakes
f0555f88d6
[runtime_env] Move worker process startup logic to context (#18341) 2021-09-08 17:08:27 -05:00
Simon Mo
e61160d514
[Dashboard] Move gcs health check to a separate thread to avoid crashing due to excessive CPU usage. (#18236) 2021-09-03 14:23:56 -07:00
Edward Oakes
1f6705d35d
[runtime_env] Centralize runtime_env logic into ray._private.runtime_env submodule (#18310) 2021-09-03 10:19:00 -05:00
Edward Oakes
5d122cf7b7
[runtime_env] Move working dir setup to the agent (#18170) 2021-08-31 17:22:49 -05:00
Edward Oakes
17dded543c
Support passing gcs_client to internal_kv (#18235) 2021-08-31 12:46:41 -05:00
Nikita Vemuri
a9c731edd3
[serve] Remove requirement to specify namespace for serve.start(detached=True) (#17470) 2021-08-25 10:39:32 -05:00
architkulkarni
97dd13be09
[Serve] [dashboard] Fix formatting bugs in cluster snapshot (#17977)
* show "unversioned" in actor metadata

* hash deployment names

* update test

* replace "Unversioned" with "None"

* bypass convert to camelCase for deployment names

* fix convert_case default to match previous setting

* lint

* replace deployment_name_hash with underscore
2021-08-24 12:06:26 -07:00
architkulkarni
5ed3f0ce35
[Serve] [Dashboard] Add end times and DELETED state for endpoints (#17898) 2021-08-19 11:10:42 -05:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. (#17885)
* gcs_utils

* resource_spec

* profiling

* ray_perf and ray_cluster_perf

* test_utils
2021-08-18 20:56:33 -07:00
architkulkarni
fcac416933
[Serve] [Dashboard] Add start times and replica tags to cluster snapshot (#17749) 2021-08-13 09:49:12 -07:00
architkulkarni
00f6b30684
[Serve] [Dashboard] Support nondetached and multiple Serve instances in cluster snapshot (#17747) 2021-08-11 22:26:54 -05:00