Commit graph

274 commits

Author SHA1 Message Date
shrekris-anyscale
c0aeb4a236
[runtime_env] Support working_dir and py_modules from HTTPS and Google Cloud Storage (#20280) 2021-11-14 02:16:45 -08:00
Edward Oakes
6c3bad52b6
[job submission] Better validation + tests for input types, refactor API (#20332) 2021-11-13 22:54:01 -08:00
Edward Oakes
07add6f7f2
Revert "Revert "[job submission] Use ray.init format addresses for Jo… (#20328) 2021-11-13 16:24:02 -08:00
mwtian
875b0aea0a
fallback to grpc.experimental.aio when importing grpc.aio (#20287) 2021-11-13 15:59:57 +09:00
Eric Liang
567e955810
Revert "[job submission] Use ray.init format addresses for JobSubmissionClient (#20245)" (#20314)
This reverts commit adc15a0fb0.
2021-11-12 21:11:24 -08:00
Nikita Vemuri
adc15a0fb0
[job submission] Use ray.init format addresses for JobSubmissionClient (#20245) 2021-11-12 13:52:43 -08:00
Edward Oakes
5ae5c1ba28
[job submission] Basic CLI prototype (#20204) 2021-11-11 15:59:13 -08:00
Teofilo Zosa
abf0eb53cc
Fix aiohttp 3.8.0 breaking changes (and unpin from 3.7) (#20261) 2021-11-11 15:35:20 -08:00
mwtian
0330852baf
[Core][Pubsub] Implement Python GCS publisher and subscriber (#20111)
## Why are these changes needed?
This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`.

Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work.

Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published.

GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change.

## Related issue number
2021-11-11 14:59:57 -08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
Edward Oakes
81f036d078
[job submission] Move job_manager to dashboard module, common parts to common.py (#20209) 2021-11-10 14:14:55 -08:00
Edward Oakes
5475bb054c
[job submission] Redirect stdout + stderr to a single log file (#20208) 2021-11-09 22:34:12 -08:00
Edward Oakes
50f2cf8a74
[job submission] Allow passing job_id, return DOES_NOT_EXIST when applicable (#20164) 2021-11-08 23:10:27 -08:00
Jiao
9ef75b27ac
[Job Submission] Add stop API to http & sdk, with better status code + stacktrace (#20094) 2021-11-06 12:37:54 -05:00
architkulkarni
c5175073b2
[runtime env] Add garbage collection for conda envs (#20072) 2021-11-04 23:13:34 -05:00
Edward Oakes
65161fe9b4
[job submission] Move HTTP routes to /api/jobs prefix (#19995) 2021-11-04 17:45:25 -05:00
Jiao
6cfb52ff1d
[job submission] Add stop API + subprocess cleanup (#19860) 2021-11-04 13:59:47 -05:00
Yi Cheng
7bb4c87780
[gcs] use gcs kv in internal kv (#19933)
## Why are these changes needed?
It's part of redis removal project. This PR focus on using gcs kv in internal kv.

- gcs client is introduced
- internal kv is updated to use gcs rpc client based kv
- related code got updated.

The other PR will update components using redis to use internal kv.

## Related issue number
https://github.com/ray-project/ray/issues/19443
2021-11-04 09:57:39 -07:00
architkulkarni
bcb63961d9
[runtime env] Add plugin name to internal URI format and add GC for py_modules (#20009) 2021-11-04 10:16:14 -05:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Edward Oakes
b2ddea255d
[job submission] Add job submission ID + status to /api/snapshot (#19994) 2021-11-03 09:49:28 -05:00
Jiajun Yao
6acf276959
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19918)
* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1
2021-11-03 12:17:55 +09:00
Edward Oakes
f8a6cad0b7
[job submission] SDK prototype w/ dynamic working_dir uploads (#19843) 2021-11-02 16:01:54 -05:00
chenk008
57363995f3
[runtime env] Move container related code to runtime env (#19067) 2021-10-29 16:31:11 -07:00
Jiao
bb0ebb7903
[job submission] Temporarily make pydantic imports conditional (#19827) 2021-10-29 18:09:18 -05:00
Edward Oakes
bf23a31017
[job submission] Always generate and return job_id (#19851) 2021-10-29 09:09:54 -05:00
Edward Oakes
42ac906313
[job submission] Support passing metadata to the JobConfig (#19845) 2021-10-28 16:40:03 -05:00
Jiajun Yao
fe8138bfc2
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19810) 2021-10-28 08:44:23 -07:00
Guyang Song
119318932a
remove the env config 'RAY_DASHBOARD_MODULE_EVENT' (#19629) 2021-10-28 16:51:59 +09:00
Edward Oakes
b2e12dc43b
[runtime_env] Add basic support for python modules (#19651) 2021-10-27 17:56:46 -05:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00
SangBin Cho
cea7fda41a
Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)" (#19577)
This reverts commit 699c5aeac6.
2021-10-21 15:36:22 -07:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
Eric Liang
699c5aeac6
Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)
This reverts commit 7fb681a35d.
2021-10-20 20:17:57 -07:00
Philipp Moritz
45f1ff0fa9
[Windows] Update react-scripts dependency for dashboard (#19489) 2021-10-20 17:57:30 -07:00
SangBin Cho
7fb681a35d
[Dashboard] Disable unnecessary event messages. (#19490)
* Disable unnecessary event messages.

* use warning

* Fix tests
2021-10-20 17:40:25 -07:00
SangBin Cho
3222d39fb8
[Dashboard] Dashboard memory improvement (#19385)
* many ppo profiling

* completed

* improve memory usage lint

* revert temporarily

* Addressed code review

* Fix a test
2021-10-19 19:34:42 -07:00
Simon Mo
a081579f68
[Dashboard] Fix gRPC GCS healthcheck thread (#19360) 2021-10-18 13:18:06 -07:00
Matti Picus
f372bb07aa
Enable dashboard on Windows (#19319) 2021-10-14 14:42:22 -07:00
Carlo Grisetti
2d0355548e
[Dashboard] Try to work around aiohttp 4.0.0 breaking changes (#19120) 2021-10-11 16:25:52 -07:00
Guyang Song
ab55b808c5
[runtime env] move worker env to runtime env in Java (#19060) 2021-10-11 17:25:09 +08:00
Carlo Grisetti
d6dbc6dc97
Fix warning message spacing (#19164) 2021-10-08 11:46:02 -07:00
chenk008
3780a73b45
[Core] Add worker resource info to runtime env (#18804) 2021-10-08 10:37:29 -07:00
Edward Oakes
1fa81673bd
[runtime_env] Clean up validation logic (#18984)
Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.
2021-10-07 14:24:41 -05:00
SangBin Cho
7fcf1bf57e
[Dashboard] Refine the dashboard restart logic. (#18973)
* in progress

* Refine the dashboard agent retry logic

* refine

* done

* lint
2021-10-04 05:01:51 -07:00
Simon Mo
9b2a368c8c
[Runtime Env] Implement basic runtime env plugin mechanism (#19044) 2021-10-01 17:22:54 -07:00
Edward Oakes
8e5d48d668
[runtime_env] Remove deprecated override_environment_variables and worker_env fields (#18213) 2021-09-30 18:55:24 -05:00
Chu Xiangyang
505aa89d12
[Dashboard] Add start/end time for job (#18901) 2021-09-28 20:57:13 -07:00
Edward Oakes
73b8936aa8
[runtime_env] Unify rpc::RuntimeEnv with serialized_runtime_env field (#18641) 2021-09-28 15:13:15 -05:00
Eric Liang
11a2dfcaab
Improve unschedulable task warning messages by integrating with the autoscaler (#18724) 2021-09-24 12:19:58 -07:00