hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Alex Wu	7a45f60dbc	[autoscaler] Fix ray.autoscaler.sdk import issue (#21795 ) This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path. Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-01-25 14:43:24 -08:00
SangBin Cho	2010f13175	Fix dashboard test bug (#21742 ) Currently `wait_until_succeeded_without_exception` is used in the dashboard, and it returns True/False. Unfortunately, there are lots of code that doesn't assert on this method (which means things are not actually tested).	2022-01-24 11:38:51 -06:00
SangBin Cho	1ae14ec513	[Dashboard] Make dashboard / agent work in minimal ray installation 1/3. (#21774 ) This is the doc that explains how to achieve this: https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit?usp=sharing The fully working e2e prototype is here (it passes all tests): `cdad913883` This PR is pure refactoring. Basically it moves some of util functions that require optional_deps to `optional_utils` so that optional deps' util functions are not used in the minimal installation. Look below to see the steps. <img width="693" alt="Screen Shot 2022-01-21 at 4 38 44 AM" src="https://user-images.githubusercontent.com/18510752/150528494-c3cdedf4-3a66-4557-b540-61436b1dbab6.png">	2022-01-23 21:11:32 -08:00
Yi Cheng	6dccfbffa9	Revert "Revert "[gcs] turn on grpc pubsub by default"" (#21585 ) Reverts ray-project/ray#21584 and turn the flag off	2022-01-13 16:12:03 -08:00
Yi Cheng	bc696212d2	Revert "[gcs] turn on grpc pubsub by default" (#21584 ) test-reconnect seems flaky. Reverts ray-project/ray#21513	2022-01-13 12:34:02 -08:00
Yi Cheng	6194783312	[gcs] turn on grpc pubsub by default (#21513 ) Turn on grpc pubsub by default. This PR also fixed several tests which are failed before. Co-authored-by: Mingwei Tian <mwtian@anyscale.com>	2022-01-12 22:13:03 -08:00
Yi Cheng	09421a4ca6	[2/gcs] Bootstrap dashboard for gcs ha (#21179 ) This is part of gcs ha project. This PR try to bootstrap dashboard with gcs address instead of redis. Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>	2021-12-21 16:58:03 -08:00
Antoni Baum	20fc9f907d	[CI] Fix tune dashboard, increase timeout for `test_commands` (#20453 )	2021-11-16 17:52:17 -08:00
Simon Mo	32a4f48aa2	[CI] Don't test tune dashboard (#20452 )	2021-11-16 15:07:56 -08:00
Yi Cheng	e54d3117a4	[gcs] Update all redis kv usage in python except function table (#20014 ) ## Why are these changes needed? This is part of redis removal project. In this PR all direct usage of redis got removed except function table. Function table will be migrated in the next PR ## Related issue number #19443	2021-11-10 20:24:53 -08:00
Guyang Song	119318932a	remove the env config 'RAY_DASHBOARD_MODULE_EVENT' (#19629 )	2021-10-28 16:51:59 +09:00
Matti Picus	f372bb07aa	Enable dashboard on Windows (#19319 )	2021-10-14 14:42:22 -07:00
SangBin Cho	7fcf1bf57e	[Dashboard] Refine the dashboard restart logic. (#18973 ) * in progress * Refine the dashboard agent retry logic * refine * done * lint	2021-10-04 05:01:51 -07:00
Eric Liang	11a2dfcaab	Improve unschedulable task warning messages by integrating with the autoscaler (#18724 )	2021-09-24 12:19:58 -07:00
Edward Oakes	7736cdd91d	[dashboard] Rename "new_dashboard" -> "dashboard" (#18214 )	2021-09-15 11:17:15 -05:00
Simon Mo	e61160d514	[Dashboard] Move gcs health check to a separate thread to avoid crashing due to excessive CPU usage. (#18236 )	2021-09-03 14:23:56 -07:00
Clark Zinzow	d958457d07	[Core] Second pass at privatizing APIs. (#17885 ) * gcs_utils * resource_spec * profiling * ray_perf and ray_cluster_perf * test_utils	2021-08-18 20:56:33 -07:00
fyrestone	57b9b1bb0f	[Dashboard] Use a dedicated RPC to check the GCS is alive (#16330 ) * Dashboard check gcs is alive * Fix dashboard hangs at exit * ray health-check call GCS CheckAlive * Minor fixes Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-07-27 14:05:44 +08:00
Amog Kamsetty	8dfd471823	Revert "Revert "[Dashboard][event] Basic event module (#16985 )" (#17068 )" (#17107 ) This reverts commit `c17e171f92`. Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-07-18 12:59:04 +08:00
Amog Kamsetty	c17e171f92	Revert "[Dashboard][event] Basic event module (#16985 )" (#17068 ) This reverts commit `f1faa79a04`.	2021-07-13 23:18:43 -07:00
fyrestone	f1faa79a04	[Dashboard][event] Basic event module (#16985 ) * Basic event module * Fix comments * Set the SCAN_EVENT_DIR_INTERVAL_SECONDS defaults to 2 * Fix lint * Fix lint * Clean code * Try to fix flaky * Fix test * Disable event module by default * Make monitor events task cancellable * Fix error Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-07-13 19:08:39 -07:00
Amog Kamsetty	a14342ce6f	Revert "[Dashboard][event] Basic event module (#16698 )" (#17004 ) This reverts commit `66ea099897`.	2021-07-12 11:22:46 -07:00
fyrestone	66ea099897	[Dashboard][event] Basic event module (#16698 ) * Basic event module * Fix comments * Set the SCAN_EVENT_DIR_INTERVAL_SECONDS defaults to 2 * Fix lint * Fix lint * Clean code * Try to fix flaky * Fix test * Disable event module by default Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-07-09 10:25:30 -07:00
fyrestone	4ca316a0f4	Move test_snapshot from test_dashboard.py to modules/snapshot/tests/test_snapshot.py (#16306 ) Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-06-08 10:26:03 -07:00
fyrestone	dfadf33a94	[Dashboard] Reorganize dashboard modules - node (#16217 )	2021-06-07 19:50:46 -07:00
Alex Wu	cd2fc7792f	[dashboard] Snapshot of cluster state (#15868 )	2021-05-20 08:10:32 -07:00
Amog Kamsetty	ebc44c3d76	[CI] Upgrade flake8 to 3.9.1 (#15527 ) * formatting * format util * format release * format rllib/agents * format rllib/env * format rllib/execution * format rllib/evaluation * format rllib/examples * format rllib/policy * format rllib utils and tests * format streaming * more formatting * update requirements files * fix rllib type checking * updates * update * fix circular import * Update python/ray/tests/test_runtime_env.py * noqa	2021-05-03 14:23:28 -07:00
fyrestone	43de7f48a7	Fix reported dashboard ip when using 0.0.0.0 (#15506 )	2021-04-27 23:48:22 +08:00
Kathryn Zhou	456d9aab47	Add Cypress test for Ray Dashboard (#14253 )	2021-02-24 20:41:52 -08:00
fyrestone	5e76a51d56	[Dashboard] Select port in dashboard (#13763 ) * Dashboard select port; Fix dashboard may hangs when exit * Add test case * Fix * Fix test_stats_collector.py::test_get_all_node_details * Refine dashboard error messages * Refine code * Refine code * Show last 10 lines of dashboard log if start dashboard failed * Fix ValueError: too many values to unpack (expected 2) when getsockname * Fix test_multi_node_3.py::test_calling_start_ray_head may fail * Fix Windows CI * Disable dashboard in C++ test * Refine code * Fix issue 7084 Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-02-23 16:27:48 -08:00
Xianyang Liu	4ecd29ea2b	[dashboard] Fixes dashboard issues when environments have set http_proxy (#12598 ) * fixes ray start with http_proxy * format * fixes * fixes * increase timeout * address comments	2021-01-21 20:10:01 -08:00
Simon Mo	dac8b3d58a	[CI] Enable Dashboard tests for master (#13425 )	2021-01-15 09:43:34 -08:00
Alex Wu	8df94e33e0	[Autoscaler] New output log format (#12772 )	2020-12-23 12:02:55 -08:00
Eric Liang	03a5b90ed6	Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990 )	2020-12-21 15:16:42 -08:00
Eric Liang	5d987f5988	Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894 )" (#12988 ) This reverts commit `3e492a79ec`.	2020-12-18 23:51:44 -08:00
Eric Liang	3e492a79ec	Increase the number of unique bits for actors to avoid handle collisions (#12894 )	2020-12-18 15:59:03 -08:00
Edward Oakes	261b2f9053	Check for raylet PID as ppid in dashboard agent fate-sharing (#12867 )	2020-12-15 12:13:11 -06:00
Max Fitton	d0813c1c58	[Dashboard] Add dashboard multi-node churn test (#11768 )	2020-12-14 17:03:33 -06:00
Stephanie Wang	a776209aec	Revert "Fix dashboard agent check ppid is raylet pid (#12256 )" (#12729 ) This reverts commit `3ce9286977`.	2020-12-09 17:20:38 -05:00
fyrestone	3ce9286977	Fix dashboard agent check ppid is raylet pid (#12256 ) * Dashboard agent check ppid is raylet pid * Improve implementation * Refine code * Make the RAY_NODE_PID environment required for dashboard agent Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-12-09 09:12:34 -05:00
Max Fitton	2708b3abbc	[Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410 ) * Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker). * simplify a piece of test code and fix a flaky time out * lint	2020-11-30 18:43:09 -08:00
SangBin Cho	753cda2f28	[Dashboard] Delete old dashboard (#12144 ) * Delete old dashboard from repo. * Delete old dashboard from repo. 2	2020-11-25 11:31:02 -08:00
fyrestone	05ad4c7499	[Dashboard] Optimize dashboard datacenter (#11391 ) * Optimize dashboard datacenter * Fix tests * Fix tests * Fix * Fix CI * python/build-wheel-macos.sh Co-authored-by: 刘宝 <po.lb@antfin.com> Co-authored-by: Max Fitton <maxfitton@anyscale.com>	2020-10-27 23:49:31 -07:00
Max Fitton	caf3b04b27	[Dashboard] Turn on new dashboard by default pt 2 (#11510 )	2020-10-23 15:52:14 -05:00
Edward Oakes	798bd6a359	[dashboard] Add /api/cluster_status endpoint (#11456 )	2020-10-19 11:00:47 -05:00
fyrestone	defd41aad7	[Dashboard] http route handler cache (#10921 ) * Add aiohttp_cache to dashboard * Add comments; Refine code * Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds * Update merge Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-10-09 22:27:05 -07:00
Max Fitton	9a6d01ebf9	[Dashboard] Add utility functions for actor and memory APIs (#11011 ) * Add actor and memory utility functions needed by upcoming logical view and memory view APIs * Add a method to allow printing Dict custom class and add support for hot-reloading local dev environment. * Address PR comments * Add unit tests from test metrics to branch for new memory_utils module * Add note about sorting / grouping * lint Co-authored-by: Max Fitton <max@semprehealth.com>	2020-10-01 23:48:03 -07:00
fyrestone	50784e2496	[Dashboard] Dashboard node grouping (#10528 ) * Add RAY_NODE_ID environment var to agent * Node ralated data use node id as key * ray.init() return node id; Pass test_reporter.py * Fix lint & CI * Fix comments * Minor fixes * Fix CI * Add const to ClientID in AgentManager::Options * Use fstring * Add comments * Fix lint * Add test_multi_nodes_info Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-09-16 10:17:29 -07:00
Edward Oakes	523705ac0f	Fix new dashboard test process check (#10584 )	2020-09-04 22:04:44 -05:00
fyrestone	e9b046306a	[Dashboard] Dashboard basic modules (#10303 ) * Improve reporter module * Add test_node_physical_stats to test_reporter.py * Add test_class_method_route_table to test_dashboard.py * Add stats_collector module for dashboard * Subscribe actor table data * Add log module for dashboard * Only enable test module in some test cases * CI run all dashboard tests * Reduce test timeout to 10s * Use fstring * Remove unused code * Remove blank line * Fix dashboard tests * Fix asyncio.create_task not available in py36; Fix lint * Add format_web_url to ray.test_utils * Update dashboard/modules/reporter/reporter_head.py Co-authored-by: Max Fitton <mfitton@berkeley.edu> * Add DictChangeItem type for Dict change * Refine logger.exception * Refine GET /api/launch_profiling * Remove disable_test_module fixture * Fix test_basic may fail Co-authored-by: 刘宝 <po.lb@antfin.com> Co-authored-by: Max Fitton <mfitton@berkeley.edu>	2020-08-29 23:09:34 -07:00

1 2

54 commits