ray/dashboard/modules
Yi Cheng dac7bf17d9
[serve] Make serve agent not blocking when GCS is down. (#27526)
This PR fixed several issue which block serve agent when GCS is down. We need to make sure serve agent is always alive and can make sure the external requests can be sent to the agent and check the status.

- internal kv used in dashboard/agent blocks the agent. We use the async one instead
- serve controller use ray.nodes which is a blocking call and blocking forever. change to use gcs client with timeout
- agent use serve controller client which is a blocking call with max retries = -1. This blocks until controller is back.

To enable Serve HA, we also need to setup:

- RAY_gcs_server_request_timeout_seconds=5
- RAY_SERVE_KV_TIMEOUT_S=5

which we should set in KubeRay.
2022-08-08 16:29:42 -07:00
..
actor [Dashboard] Stop caching logs in memory. Use state observability api to fetch on demand. (#26818) 2022-07-26 03:10:57 -07:00
event [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
healthz [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
job Convert job_manager to be async (#27123) 2022-08-05 19:33:49 -07:00
log Revert Revert "[Observability] Fix --follow lost connection when it is used for > 30 seconds" #26162 (#26163) 2022-06-28 16:07:32 -07:00
node [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
reporter [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
runtime_env [docs] Fix the remaining style violations in docstrings and add lint rule (#27033) 2022-07-27 22:24:20 -07:00
serve [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
snapshot [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
state Convert job_manager to be async (#27123) 2022-08-05 19:33:49 -07:00
test [api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695) 2022-06-21 15:13:29 -07:00
tests [serve] Reject Ray client addresses when submitting via Dashboard (#23339) 2022-03-21 11:17:51 -05:00
tune [tune] fix set_tune_experiment (#26298) 2022-07-05 15:04:51 -07:00
usage_stats [Usage Stats] Record usage stats when dashboard disabled (#26042) 2022-07-28 23:01:49 -07:00
__init__.py [Dashboard] New dashboard skeleton (#9099) 2020-07-27 11:34:47 +08:00
dashboard_sdk.py [core] ray.init defaults to an existing Ray instance if there is one (#26678) 2022-07-23 11:27:22 -07:00
version.py bump jobs version after making a backwards-incompatible change (#27281) 2022-07-30 00:11:29 -07:00