Commit graph

152 commits

Author SHA1 Message Date
Guyang Song
ad56b9b432
[runtime env] redefine runtime env to protobuf (#19511) 2021-11-20 16:54:42 +08:00
Jiao
1a00964902
[job submission] Fix job sdk's lazy import of requests that led to minimal build failure (#20577) 2021-11-19 17:04:22 -06:00
mwtian
da79f24e8c
[Core][Pubsub] Refactor to prepare for migrating logging to Ray pubsub (#20560)
## Why are these changes needed?
Publisher and subscriber for logs, in driver, dashboard and tests are refactored to make it easier to support using Ray pubsub for logs. Actual support of Ray pubsub for logs will be added later in #20492.

This PR does not intend to introduce any behavior change.

## Related issue number
2021-11-19 12:28:37 -08:00
Edward Oakes
d26c9e67e8
[job submission] Add a message to the JobStatus to return more detailed errors (#20491) 2021-11-18 10:15:23 -06:00
Edward Oakes
eae523159f
[job submission] Prefix job ID with raysubmit_ and pass job_name metadata (#20490) 2021-11-17 21:48:22 -06:00
Antoni Baum
20fc9f907d
[CI] Fix tune dashboard, increase timeout for test_commands (#20453) 2021-11-16 17:52:17 -08:00
Yi Cheng
a4e187c0e7
[gcs] Update function table to use internal kv (#20152)
## Why are these changes needed?
This is a part of redis removal. This PR remove redis kv in function table. 
rpush related code is not updated in this PR.

## Related issue number
2021-11-15 23:34:41 -08:00
Edward Oakes
48bc1af2da
[job submission] Remove DOES_NOT_EXIST status (#20354) 2021-11-15 16:57:32 -08:00
Lixin Wei
b7e35acf14
[RuntimeEnv] Raise RuntimeEnvSetupError when Actor Creation Failed due to It (#19888)
* ray_pkg passed

* fix

* fix typo

* fix test

* fix test

* fix test

* fix

* draft

* compile OK

* lint

* fix

* lint

* fix ci

* Update src/ray/gcs/gcs_server/gcs_actor_manager.cc

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* remove comment

* rename

* resolve conflict

* use unique ownership

* use DestroyActor instead of ReconstructActor

* fix sigment fault

* fix crash in debug log

* Revert "fix crash in debug log"

This reverts commit 8f0e3d37f062b664d8d0e07c6c1a9a715b8ba1ee.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-11-15 07:43:35 -08:00
Edward Oakes
2d5d499f67
[job submission] Support specifying runtime_env to job submission CLI (#20339) 2021-11-14 13:52:47 -08:00
shrekris-anyscale
c0aeb4a236
[runtime_env] Support working_dir and py_modules from HTTPS and Google Cloud Storage (#20280) 2021-11-14 02:16:45 -08:00
Edward Oakes
6c3bad52b6
[job submission] Better validation + tests for input types, refactor API (#20332) 2021-11-13 22:54:01 -08:00
Edward Oakes
07add6f7f2
Revert "Revert "[job submission] Use ray.init format addresses for Jo… (#20328) 2021-11-13 16:24:02 -08:00
mwtian
875b0aea0a
fallback to grpc.experimental.aio when importing grpc.aio (#20287) 2021-11-13 15:59:57 +09:00
Eric Liang
567e955810
Revert "[job submission] Use ray.init format addresses for JobSubmissionClient (#20245)" (#20314)
This reverts commit adc15a0fb0.
2021-11-12 21:11:24 -08:00
Nikita Vemuri
adc15a0fb0
[job submission] Use ray.init format addresses for JobSubmissionClient (#20245) 2021-11-12 13:52:43 -08:00
Edward Oakes
5ae5c1ba28
[job submission] Basic CLI prototype (#20204) 2021-11-11 15:59:13 -08:00
mwtian
0330852baf
[Core][Pubsub] Implement Python GCS publisher and subscriber (#20111)
## Why are these changes needed?
This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`.

Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work.

Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published.

GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change.

## Related issue number
2021-11-11 14:59:57 -08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
Edward Oakes
81f036d078
[job submission] Move job_manager to dashboard module, common parts to common.py (#20209) 2021-11-10 14:14:55 -08:00
Edward Oakes
5475bb054c
[job submission] Redirect stdout + stderr to a single log file (#20208) 2021-11-09 22:34:12 -08:00
Edward Oakes
50f2cf8a74
[job submission] Allow passing job_id, return DOES_NOT_EXIST when applicable (#20164) 2021-11-08 23:10:27 -08:00
Jiao
9ef75b27ac
[Job Submission] Add stop API to http & sdk, with better status code + stacktrace (#20094) 2021-11-06 12:37:54 -05:00
architkulkarni
c5175073b2
[runtime env] Add garbage collection for conda envs (#20072) 2021-11-04 23:13:34 -05:00
Edward Oakes
65161fe9b4
[job submission] Move HTTP routes to /api/jobs prefix (#19995) 2021-11-04 17:45:25 -05:00
Jiao
6cfb52ff1d
[job submission] Add stop API + subprocess cleanup (#19860) 2021-11-04 13:59:47 -05:00
architkulkarni
bcb63961d9
[runtime env] Add plugin name to internal URI format and add GC for py_modules (#20009) 2021-11-04 10:16:14 -05:00
Edward Oakes
b2ddea255d
[job submission] Add job submission ID + status to /api/snapshot (#19994) 2021-11-03 09:49:28 -05:00
Jiajun Yao
6acf276959
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19918)
* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1
2021-11-03 12:17:55 +09:00
Edward Oakes
f8a6cad0b7
[job submission] SDK prototype w/ dynamic working_dir uploads (#19843) 2021-11-02 16:01:54 -05:00
chenk008
57363995f3
[runtime env] Move container related code to runtime env (#19067) 2021-10-29 16:31:11 -07:00
Jiao
bb0ebb7903
[job submission] Temporarily make pydantic imports conditional (#19827) 2021-10-29 18:09:18 -05:00
Edward Oakes
bf23a31017
[job submission] Always generate and return job_id (#19851) 2021-10-29 09:09:54 -05:00
Edward Oakes
42ac906313
[job submission] Support passing metadata to the JobConfig (#19845) 2021-10-28 16:40:03 -05:00
Guyang Song
119318932a
remove the env config 'RAY_DASHBOARD_MODULE_EVENT' (#19629) 2021-10-28 16:51:59 +09:00
Edward Oakes
b2e12dc43b
[runtime_env] Add basic support for python modules (#19651) 2021-10-27 17:56:46 -05:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
SangBin Cho
3222d39fb8
[Dashboard] Dashboard memory improvement (#19385)
* many ppo profiling

* completed

* improve memory usage lint

* revert temporarily

* Addressed code review

* Fix a test
2021-10-19 19:34:42 -07:00
Matti Picus
f372bb07aa
Enable dashboard on Windows (#19319) 2021-10-14 14:42:22 -07:00
Guyang Song
ab55b808c5
[runtime env] move worker env to runtime env in Java (#19060) 2021-10-11 17:25:09 +08:00
chenk008
3780a73b45
[Core] Add worker resource info to runtime env (#18804) 2021-10-08 10:37:29 -07:00
Edward Oakes
1fa81673bd
[runtime_env] Clean up validation logic (#18984)
Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.
2021-10-07 14:24:41 -05:00
Simon Mo
9b2a368c8c
[Runtime Env] Implement basic runtime env plugin mechanism (#19044) 2021-10-01 17:22:54 -07:00
Edward Oakes
8e5d48d668
[runtime_env] Remove deprecated override_environment_variables and worker_env fields (#18213) 2021-09-30 18:55:24 -05:00
Edward Oakes
73b8936aa8
[runtime_env] Unify rpc::RuntimeEnv with serialized_runtime_env field (#18641) 2021-09-28 15:13:15 -05:00
architkulkarni
fbf5f5d56b
[runtime env] [Serve] Fix error when uris field is None (#18874) 2021-09-24 14:07:17 -05:00
Qing Wang
6f1d3f94db
Publish actor state PENDING_CREATION for dashboard showing. (#18666) 2021-09-18 15:44:58 +08:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Tanmay Chordia
bf1176311f
[dashboard] add an endpoint to force kill an actor (#18508) 2021-09-13 20:03:15 -07:00