Commit graph

59 commits

Author SHA1 Message Date
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
SangBin Cho
e62c0052a0
[Dashboard] Agent in minimal ray installation (#21817)
This is the second part of https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit#. After this PR, dashboard agents will fully work with minimal ray installation.

Note that this PR requires to introduce "aioredis", "frozenlist", and "aiosignal" to the minimal installation. These dependencies are very small (or will be removed soon), and including them to minimal makes thing very easy. Please see the below for the reasoning.
2022-01-26 04:03:54 -08:00
SangBin Cho
1ae14ec513
[Dashboard] Make dashboard / agent work in minimal ray installation 1/3. (#21774)
This is the doc that explains how to achieve this: https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit?usp=sharing

The fully working e2e prototype is here (it passes all tests): cdad913883

This PR is pure refactoring. Basically it moves some of util functions that require optional_deps to `optional_utils` so that optional deps' util functions are not used in the minimal installation. Look below to see the steps. 

<img width="693" alt="Screen Shot 2022-01-21 at 4 38 44 AM" src="https://user-images.githubusercontent.com/18510752/150528494-c3cdedf4-3a66-4557-b540-61436b1dbab6.png">
2022-01-23 21:11:32 -08:00
Yi Cheng
6dccfbffa9
Revert "Revert "[gcs] turn on grpc pubsub by default"" (#21585)
Reverts ray-project/ray#21584 and turn the flag off
2022-01-13 16:12:03 -08:00
Yi Cheng
bc696212d2
Revert "[gcs] turn on grpc pubsub by default" (#21584)
test-reconnect seems flaky.
Reverts ray-project/ray#21513
2022-01-13 12:34:02 -08:00
Yi Cheng
6194783312
[gcs] turn on grpc pubsub by default (#21513)
Turn on grpc pubsub by default.  This PR also fixed several tests which are failed before.

Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2022-01-12 22:13:03 -08:00
Yi Cheng
09421a4ca6
[2/gcs] Bootstrap dashboard for gcs ha (#21179)
This is part of gcs ha project. This PR try to bootstrap dashboard with gcs address instead of redis.

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2021-12-21 16:58:03 -08:00
Yi Cheng
87fa56def4
[gcs] Make gcs client in python able to auto reconnect (#20299)
## Why are these changes needed?
Since we are using gcs client as kv backend, we need to make it auto-reconnect in case of a failure. This PR adds this feature.

This PR adds auto_reconnect decorator to gcs-utils and in case of a failure it'll try to reconnect to gcs until it succeeds.

This feature right now support redis which should be deleted later once we finished bootstrap since kv will always go to gcs.

## Related issue number
2021-11-14 11:27:49 -08:00
mwtian
875b0aea0a
fallback to grpc.experimental.aio when importing grpc.aio (#20287) 2021-11-13 15:59:57 +09:00
mwtian
0330852baf
[Core][Pubsub] Implement Python GCS publisher and subscriber (#20111)
## Why are these changes needed?
This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`.

Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work.

Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published.

GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change.

## Related issue number
2021-11-11 14:59:57 -08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
Yi Cheng
7bb4c87780
[gcs] use gcs kv in internal kv (#19933)
## Why are these changes needed?
It's part of redis removal project. This PR focus on using gcs kv in internal kv.

- gcs client is introduced
- internal kv is updated to use gcs rpc client based kv
- related code got updated.

The other PR will update components using redis to use internal kv.

## Related issue number
https://github.com/ray-project/ray/issues/19443
2021-11-04 09:57:39 -07:00
Jiajun Yao
6acf276959
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19918)
* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1

* Listen to 127.0.0.1 if node ip is 127.0.0.1
2021-11-03 12:17:55 +09:00
Jiajun Yao
fe8138bfc2
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19810) 2021-10-28 08:44:23 -07:00
SangBin Cho
cea7fda41a
Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)" (#19577)
This reverts commit 699c5aeac6.
2021-10-21 15:36:22 -07:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
Eric Liang
699c5aeac6
Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)
This reverts commit 7fb681a35d.
2021-10-20 20:17:57 -07:00
SangBin Cho
7fb681a35d
[Dashboard] Disable unnecessary event messages. (#19490)
* Disable unnecessary event messages.

* use warning

* Fix tests
2021-10-20 17:40:25 -07:00
Matti Picus
f372bb07aa
Enable dashboard on Windows (#19319) 2021-10-14 14:42:22 -07:00
Carlo Grisetti
2d0355548e
[Dashboard] Try to work around aiohttp 4.0.0 breaking changes (#19120) 2021-10-11 16:25:52 -07:00
Carlo Grisetti
d6dbc6dc97
Fix warning message spacing (#19164) 2021-10-08 11:46:02 -07:00
SangBin Cho
7fcf1bf57e
[Dashboard] Refine the dashboard restart logic. (#18973)
* in progress

* Refine the dashboard agent retry logic

* refine

* done

* lint
2021-10-04 05:01:51 -07:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Edward Oakes
1f6705d35d
[runtime_env] Centralize runtime_env logic into ray._private.runtime_env submodule (#18310) 2021-09-03 10:19:00 -05:00
Edward Oakes
5d122cf7b7
[runtime_env] Move working dir setup to the agent (#18170) 2021-08-31 17:22:49 -05:00
Edward Oakes
b969aa3c80
[dashboard] Don't start dashboard agent when missing dependencies (#17966) 2021-08-21 01:04:21 -07:00
Amog Kamsetty
add6ceb3ec
[Dependencies] Fix missing dependency UX (#17420)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-05 20:18:42 -07:00
Ian Rodney
7b1c207be3
[Dashboard] Allow agent to bind to Wildcard address. (#17393) 2021-08-03 02:03:19 -07:00
Ian Rodney
b26ba7ba9e
[Dashboard] Allow Agent HTTP listening port to be specified. (#17392) 2021-08-02 02:09:50 -07:00
Simon Mo
4a4210a083
Support streaming output of runtime env setup to logger/driver (#17306) 2021-07-27 16:39:15 -07:00
Richard Liaw
597dc08dfe
Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies"" (#17254)
* Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies" (#17251)"

This reverts commit 7b44dd8ecb.

* Lint

* Fix more imports

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-07-26 21:09:25 -07:00
Kai Fricke
e881c6cff8
[core] remove aiohttp dependencies (#17181) 2021-07-21 07:18:19 +01:00
Amog Kamsetty
cb74053ee5
Retry remove gpustat dependency (#17115)
* remove gpustat

* move psutil imports
2021-07-19 11:14:10 -07:00
SongGuyang
e74d9d3ded
[runtime env] Download runtime env(conda) in agent instead of setup_worker (#16525) 2021-06-25 19:39:05 +08:00
SongGuyang
874e947d6f
[runtime env] support create or delete runtime envs in agent (#15904) 2021-06-09 20:22:25 +08:00
Clark Zinzow
5a788474aa
[Core] First pass at privatizing non-public Python APIs. (#14607)
* async_compat

* utils

* cluster_utils

* compat

* function_manager

* import_thread

* memory_monitor

* monitor, log_monitor, ray_process_reaper

* metrics_agent

* parameter

* prometheus_exporter

* ray_logging

* signature
2021-03-10 22:47:28 -08:00
fyrestone
5e76a51d56
[Dashboard] Select port in dashboard (#13763)
* Dashboard select port; Fix dashboard may hangs when exit

* Add test case

* Fix

* Fix test_stats_collector.py::test_get_all_node_details

* Refine dashboard error messages

* Refine code

* Refine code

* Show last 10 lines of dashboard log if start dashboard failed

* Fix ValueError: too many values to unpack (expected 2) when getsockname

* Fix test_multi_node_3.py::test_calling_start_ray_head may fail

* Fix Windows CI

* Disable dashboard in C++ test

* Refine code

* Fix issue 7084

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-02-23 16:27:48 -08:00
Alex Wu
02938f3a21
[hotfix] Disable dashboard agent windows (#14062) 2021-02-11 17:54:55 -08:00
Clark Zinzow
2d34e95c93
Don't gather check_parent_task on Windows, since it's undefined. (#13700) 2021-01-27 09:19:58 -08:00
Amog Kamsetty
d96a9fa192
Revert "Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948)" (#13572)" (#13685)
This reverts commit c4a710369b.
2021-01-25 10:35:25 -08:00
Amog Kamsetty
c4a710369b
Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948)" (#13572)
This reverts commit ef6d859e9b.
2021-01-22 14:10:24 -06:00
Xianyang Liu
4ecd29ea2b
[dashboard] Fixes dashboard issues when environments have set http_proxy (#12598)
* fixes ray start with http_proxy

* format

* fixes

* fixes

* increase timeout

* address comments
2021-01-21 20:10:01 -08:00
Edward Oakes
ef6d859e9b
[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948) 2020-12-31 10:54:40 -06:00
Edward Oakes
261b2f9053
Check for raylet PID as ppid in dashboard agent fate-sharing (#12867) 2020-12-15 12:13:11 -06:00
Stephanie Wang
a776209aec
Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977.
2020-12-09 17:20:38 -05:00
fyrestone
3ce9286977
Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid

* Improve implementation

* Refine code

* Make the RAY_NODE_PID environment required for dashboard agent

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-09 09:12:34 -05:00
SangBin Cho
162f361dab
[Logging] Fix log monitor issue (#12588)
* Try fixing issues.

* Verficiation.
2020-12-07 22:01:18 -08:00
SangBin Cho
8223a33bff
[Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
fyrestone
0c6bb745cd
Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00