ray/dashboard/modules
Archit Kulkarni 058c239cf1
[runtime env] Test common failure scenarios (#25977)
Tests the following failure scenarios:
- Fail to upload data in `ray.init()` (`working_dir`, `py_modules`)
- Eager install fails in `ray.init()` for some other reason (bad `pip` package)
- Fail to download data from GCS (`working_dir`)

Improves the following error message cases:
- Return RuntimeEnvSetupError on failure to upload working_dir or py_modules
- Return RuntimeEnvSetupError on failure to download files from GCS during runtime env setup

Not covered in this PR:
- RPC to agent fails (This is extremely rare because the Raylet and agent are on the same node.)
- Agent is not started or dead (We don't need to worry about this because the Raylet fate shares with the agent.)

The approach is to use environment variables to induce failures in various places.  The alternative would be to refactor the packaging code to use dependency injection for the Internal KV client so that we can pass in a fake. I'm not sure how much of an improvement this would be.  I think we'd still have to set an environment variable to pass in the fake client, because these are essentially e2e tests of `ray.init()` and we don't have an API to pass it in.
2022-08-15 11:35:56 -05:00
..
actor [Dashboard] Stop caching logs in memory. Use state observability api to fetch on demand. (#26818) 2022-07-26 03:10:57 -07:00
event [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
healthz [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
job [runtime env] Test common failure scenarios (#25977) 2022-08-15 11:35:56 -05:00
log [Dashboard] Fix edge cases for log file names in the dashboard log viewer (#27772) 2022-08-12 09:39:54 -07:00
node [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
reporter [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
runtime_env [docs] Fix the remaining style violations in docstrings and add lint rule (#27033) 2022-07-27 22:24:20 -07:00
serve [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
snapshot [serve] Make serve agent not blocking when GCS is down. (#27526) 2022-08-08 16:29:42 -07:00
state Convert job_manager to be async (#27123) 2022-08-05 19:33:49 -07:00
test [api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695) 2022-06-21 15:13:29 -07:00
tests [serve] Reject Ray client addresses when submitting via Dashboard (#23339) 2022-03-21 11:17:51 -05:00
tune [tune] fix set_tune_experiment (#26298) 2022-07-05 15:04:51 -07:00
usage_stats [Usage Stats] Record usage stats when dashboard disabled (#26042) 2022-07-28 23:01:49 -07:00
__init__.py [Dashboard] New dashboard skeleton (#9099) 2020-07-27 11:34:47 +08:00
dashboard_sdk.py [core] ray.init defaults to an existing Ray instance if there is one (#26678) 2022-07-23 11:27:22 -07:00
version.py bump jobs version after making a backwards-incompatible change (#27281) 2022-07-30 00:11:29 -07:00