hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
SangBin Cho	a04ab9b472	[Core] Fix ray memory bug (#14452 ) * ray memory bug * Fix ray memory issue. * done.	2021-03-03 09:20:00 -08:00
SangBin Cho	09fd38ede1	[Multi node shuffle] More efficient ray memory --stats-only (#14423 ) * Done. * Fix all the issues.	2021-03-01 23:14:06 -08:00
Eric Liang	9db000ff2c	Auto report object store memory usage; remove some deprecated code (#14260 )	2021-03-01 13:19:44 -08:00
Micah Yong	db0c16824c	[Dashboard][CLI] Ray memory parity with dashboard 2 (#13444 ) * Minor improvements in Ray Core Walkthrough as seen in https://github.com/ray-project/ray/issues/12472 * Define node_stats() to return NodeStats object from cluster * Add --group-by and --sort-by capabilities to ray memory script * Resolve merge conflict * Add helper functions for group by and sorting type in memory_utils.py * Reformat * Format * Compartmentalize memory script into get_memory_summary and get_store_stats_summary * Modify unit tests in test_mem_stat * Lint and format * Test cases for group_by sort_by * Lint and format * Fix actor handle failing test case * Update test_memstat.py * Resolve merge conflicts * Adjust ray memory output based on terminal size * Formatting and linting * Use constant for callsite length * Switch from OS to shutil for querying terminal size (official python support) * Linting and formatting * Lint and format * Resolve lint issue in walkthrough.rst * Revert to python 3.6 * Delete visitor.py It was accidentally included in most recent commit * Delete .eggs It was accidentally included in most recent commit * Resolve test_object_spilling.py test case * Add stats only argument * revert changes on this file * Remove package-lock.json * Add back npm installation * Sync package-lock.json * Linting and formatting * Sync with package-lock * Sync with package-lock pt 2 * Update documentation in https://docs.ray.io/en/master/memory-management.html * Add include_memory_info as argument for node_stats * Switch object ref and call site positions * Linting and formatting * Change from MiB to B * Change from stats-only to store-true * Add memory test case * Add memory test case * Lint and format * Correct test in memstat * Change line wrap and stats only to flags * Clarify --stats-only and --no-format in ray memory * --stats-only description modified Co-authored-by: Micah Yong <micahyong@Micahs-MacBook-Pro.local>	2021-03-01 09:27:22 -08:00
Kathryn Zhou	456d9aab47	Add Cypress test for Ray Dashboard (#14253 )	2021-02-24 20:41:52 -08:00
niole	488f63efe3	[Dashboard] Make requests sent by the dashboard reverse proxy compatible (#14012 )	2021-02-24 18:31:59 -08:00
fyrestone	5e76a51d56	[Dashboard] Select port in dashboard (#13763 ) * Dashboard select port; Fix dashboard may hangs when exit * Add test case * Fix * Fix test_stats_collector.py::test_get_all_node_details * Refine dashboard error messages * Refine code * Refine code * Show last 10 lines of dashboard log if start dashboard failed * Fix ValueError: too many values to unpack (expected 2) when getsockname * Fix test_multi_node_3.py::test_calling_start_ray_head may fail * Fix Windows CI * Disable dashboard in C++ test * Refine code * Fix issue 7084 Co-authored-by: 刘宝 <po.lb@antfin.com>	2021-02-23 16:27:48 -08:00
Kathryn Zhou	d6521be7ef	Export GPU metrics, CPU count, and additional Memory metrics to Prometheus (#14170 )	2021-02-22 10:04:18 -08:00
Kathryn Zhou	f6b5e838fe	Add disk and network metrics to Prometheus and fix dashboard (#14144 )	2021-02-17 10:27:14 -08:00
Simon Mo	33316d4f8f	Revert "Export additional metrics to Prometheus (#14061 )" (#14134 ) This reverts commit `82539f2da4`.	2021-02-16 12:49:12 -08:00
Kathryn Zhou	82539f2da4	Export additional metrics to Prometheus (#14061 )	2021-02-14 23:16:26 -08:00
Alex Wu	02938f3a21	[hotfix] Disable dashboard agent windows (#14062 )	2021-02-11 17:54:55 -08:00
Dominic Ming	4b60c388ef	[Dashboard] fix new dashboard entrance and some table problem (#13790 )	2021-01-30 10:42:16 +08:00
Dominic Ming	752da83bb7	[Dashboard] Add the new dashboard code and prompt users to try it (#11667 )	2021-01-29 15:22:26 +08:00
Tao Wang	56ee6ef55f	[GCS]only update states related fields when publish actor table data (#13448 )	2021-01-28 11:12:57 +08:00
Clark Zinzow	2d34e95c93	Don't gather check_parent_task on Windows, since it's undefined. (#13700 )	2021-01-27 09:19:58 -08:00
Amog Kamsetty	d96a9fa192	Revert "Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948 )" (#13572 )" (#13685 ) This reverts commit `c4a710369b`.	2021-01-25 10:35:25 -08:00
Ameer Haj Ali	b7dd7ddb52	deprecate useless fields in the cluster yaml. (#13637 ) * prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5. * reducing diff * make get_nodes_to_launch return a dict * merge * weird merge fix * auto fill instance types for AWS * Alex/Eric * Update doc/source/cluster/autoscaling.rst * merge autofill and input from user * logger.exception * make the yaml use the default autofill * docs Eric * remove test_autoscaler_yaml from windows tests * lets try changing the test a bit * return test * lets see * edward * Limit max launch concurrency * commenting frac TODO * move to resource demand scheduler * use STATUS UP TO DATE * Eric * make logger of gc freed refs debug instead of info * add cluster name to docker mount prefix directory * grrR * fix tests * moving docker directory to sdk * move the import to prevent circular dependency * smallf fix * ian * fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running * small fix * deflake test_joblib * lint * placement groups bypass * remove space * Eric * first ocmmit * lint * exmaple * documentation * hmm * file path fix * fix test * some format issue in docs * modified docs * joblib strikes again on windows * add ability to not start autoscaler/monitor * a * remove worker_default * Remove default pod type from operator * Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types * deprecate useless fields Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local> Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal> Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>	2021-01-23 12:06:51 -08:00
Amog Kamsetty	c4a710369b	Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948 )" (#13572 ) This reverts commit `ef6d859e9b`.	2021-01-22 14:10:24 -06:00
Tao Wang	aa5d7a5e6c	[Dashboard]Don't set node actors when node_id of actor is Nil (#13573 ) * Don't set node actors when node_id of actor is Nil * add test per comment	2021-01-21 20:18:34 -08:00
Xianyang Liu	4ecd29ea2b	[dashboard] Fixes dashboard issues when environments have set http_proxy (#12598 ) * fixes ray start with http_proxy * format * fixes * fixes * increase timeout * address comments	2021-01-21 20:10:01 -08:00
Simon Mo	dac8b3d58a	[CI] Enable Dashboard tests for master (#13425 )	2021-01-15 09:43:34 -08:00
Simon Mo	321bbe1ffb	[Dashboard] Fix GPU resource rendering issue (#13388 )	2021-01-14 12:23:21 -08:00
fyrestone	4853aa96cb	[Dashboard] Fix missing actor pid (#13229 )	2021-01-13 16:45:12 +08:00
fyrestone	a6d135a072	[Dashboard] Add GET /log_proxy API (#13165 )	2021-01-08 11:45:07 +08:00
SangBin Cho	32dc5676b4	[Metrics] Record per node and raylet cpu / mem usage (#12982 ) * Record per node and raylet cpu / mem usage * Add comments. * Addressed code review.	2021-01-05 21:57:21 -08:00
Edward Oakes	ef6d859e9b	[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948 )	2020-12-31 10:54:40 -06:00
fyrestone	6a54897577	Job module without submission (#13081 ) Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-12-31 11:12:17 +08:00
Max Fitton	25f7bdc0d8	[Bugfix][Dashboard] Fix undefined logCount, errorCount UI crash (#13113 )	2020-12-30 14:19:56 -06:00
Alex Wu	8df94e33e0	[Autoscaler] New output log format (#12772 )	2020-12-23 12:02:55 -08:00
fyrestone	62a5832007	[Dashboard] Add GET /logical/actors API (#12913 )	2020-12-23 11:14:23 +08:00
Eric Liang	03a5b90ed6	Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990 )	2020-12-21 15:16:42 -08:00
Eric Liang	64c97d25d3	Enable by default new scheduler (#12735 )	2020-12-19 13:22:24 -08:00
Eric Liang	5d987f5988	Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894 )" (#12988 ) This reverts commit `3e492a79ec`.	2020-12-18 23:51:44 -08:00
Eric Liang	3e492a79ec	Increase the number of unique bits for actors to avoid handle collisions (#12894 )	2020-12-18 15:59:03 -08:00
Edward Oakes	261b2f9053	Check for raylet PID as ppid in dashboard agent fate-sharing (#12867 )	2020-12-15 12:13:11 -06:00
Max Fitton	d0813c1c58	[Dashboard] Add dashboard multi-node churn test (#11768 )	2020-12-14 17:03:33 -06:00
Max Fitton	ac24d1db30	[Dashboard][Bugfix] Fix GPU List Bug (#12666 ) * Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list * lint * dashboard/modules/logical_view * fix test * trigger build	2020-12-12 23:34:24 -08:00
Stephanie Wang	a776209aec	Revert "Fix dashboard agent check ppid is raylet pid (#12256 )" (#12729 ) This reverts commit `3ce9286977`.	2020-12-09 17:20:38 -05:00
fyrestone	3ce9286977	Fix dashboard agent check ppid is raylet pid (#12256 ) * Dashboard agent check ppid is raylet pid * Improve implementation * Refine code * Make the RAY_NODE_PID environment required for dashboard agent Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-12-09 09:12:34 -05:00
Sumanth Ratna	b7404e7955	[dashboard] Resolve npm vulnerabilities (#12620 ) * npm audit fix * npm dedupe	2020-12-08 10:26:49 -08:00
SangBin Cho	162f361dab	[Logging] Fix log monitor issue (#12588 ) * Try fixing issues. * Verficiation.	2020-12-07 22:01:18 -08:00
Max Fitton	cc2f43c826	[Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard (#12660 ) * Fix bug with worker logs/errors not displaying in the dashboard * Add error endpoint test. * lint	2020-12-07 21:41:13 -08:00
Max Fitton	34b9c7449b	[Dashboard] Fix object store memory display. (#12664 )	2020-12-07 21:40:49 -08:00
Max Fitton	a5c846c83b	[Dashboard][Bugfix] Filter dead nodes from Machine View (fixes duplicate node issue) (#12579 )	2020-12-02 14:08:14 -08:00
SangBin Cho	8223a33bff	[Logging] Log rotation on all components (#12101 ) * In Progress. * Done. * Fix the issue. * Add wait for condition because logs are not written right away now. * debug string. * lint. * Fix flaky test. * Fix issues. * Fix test. * lint.	2020-11-30 19:03:55 -08:00
Max Fitton	2708b3abbc	[Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410 ) * Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker). * simplify a piece of test code and fix a flaky time out * lint	2020-11-30 18:43:09 -08:00
SangBin Cho	753cda2f28	[Dashboard] Delete old dashboard (#12144 ) * Delete old dashboard from repo. * Delete old dashboard from repo. 2	2020-11-25 11:31:02 -08:00
Max Fitton	2e95552f0c	[Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358 )	2020-11-25 11:06:45 -08:00
SangBin Cho	5fb410cfbf	[Dashboard] New dashboard view data doesn't exist. (#12129 ) * Fix. * Fix the issue.	2020-11-19 11:04:59 -08:00

1 2

84 commits