Commit graph

57 commits

Author SHA1 Message Date
fyrestone
6a54897577
Job module without submission (#13081)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-31 11:12:17 +08:00
Max Fitton
25f7bdc0d8
[Bugfix][Dashboard] Fix undefined logCount, errorCount UI crash (#13113) 2020-12-30 14:19:56 -06:00
Alex Wu
8df94e33e0
[Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
fyrestone
62a5832007
[Dashboard] Add GET /logical/actors API (#12913) 2020-12-23 11:14:23 +08:00
Eric Liang
03a5b90ed6
Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990) 2020-12-21 15:16:42 -08:00
Eric Liang
64c97d25d3
Enable by default new scheduler (#12735) 2020-12-19 13:22:24 -08:00
Eric Liang
5d987f5988
Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894)" (#12988)
This reverts commit 3e492a79ec.
2020-12-18 23:51:44 -08:00
Eric Liang
3e492a79ec
Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Edward Oakes
261b2f9053
Check for raylet PID as ppid in dashboard agent fate-sharing (#12867) 2020-12-15 12:13:11 -06:00
Max Fitton
d0813c1c58
[Dashboard] Add dashboard multi-node churn test (#11768) 2020-12-14 17:03:33 -06:00
Max Fitton
ac24d1db30
[Dashboard][Bugfix] Fix GPU List Bug (#12666)
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list

* lint

* dashboard/modules/logical_view

* fix test

* trigger build
2020-12-12 23:34:24 -08:00
Stephanie Wang
a776209aec
Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977.
2020-12-09 17:20:38 -05:00
fyrestone
3ce9286977
Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid

* Improve implementation

* Refine code

* Make the RAY_NODE_PID environment required for dashboard agent

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-09 09:12:34 -05:00
Sumanth Ratna
b7404e7955
[dashboard] Resolve npm vulnerabilities (#12620)
* npm audit fix

* npm dedupe
2020-12-08 10:26:49 -08:00
SangBin Cho
162f361dab
[Logging] Fix log monitor issue (#12588)
* Try fixing issues.

* Verficiation.
2020-12-07 22:01:18 -08:00
Max Fitton
cc2f43c826
[Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard (#12660)
* Fix bug with worker logs/errors not displaying in the dashboard

* Add error endpoint test.

* lint
2020-12-07 21:41:13 -08:00
Max Fitton
34b9c7449b
[Dashboard] Fix object store memory display. (#12664) 2020-12-07 21:40:49 -08:00
Max Fitton
a5c846c83b
[Dashboard][Bugfix] Filter dead nodes from Machine View (fixes duplicate node issue) (#12579) 2020-12-02 14:08:14 -08:00
SangBin Cho
8223a33bff
[Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
Max Fitton
2708b3abbc
[Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410)
* Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker).

* simplify a piece of test code and fix a flaky time out

* lint
2020-11-30 18:43:09 -08:00
SangBin Cho
753cda2f28
[Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton
2e95552f0c
[Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358) 2020-11-25 11:06:45 -08:00
SangBin Cho
5fb410cfbf
[Dashboard] New dashboard view data doesn't exist. (#12129)
* Fix.

* Fix the issue.
2020-11-19 11:04:59 -08:00
SangBin Cho
7d67af6c2a
[Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.

* Fix issues.
2020-11-19 11:04:26 -08:00
fyrestone
0c6bb745cd
Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
Max Fitton
f545418c3f
[Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) 2020-11-11 14:55:54 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
Max Fitton
368b14a0da
Stop dashboard from erroring when an actor does not have a corresponding core worker (#11870) 2020-11-09 11:36:34 -06:00
Max Fitton
d352feadf0
[Dashboard] Memory Page Loading Wheel (#11651)
* Switch memory view loading message over to a loading wheel to make UX less confusing.

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-11-03 09:37:30 -08:00
Max Fitton
3202ff74c2
[Dashboard] Don't show GPU columns if no GPU in cluster (#11704) 2020-11-02 18:07:27 -06:00
Max Fitton
b4df42b027
[Dashboard] Make Infeasible Actor UX Less Scary (#11654)
* Update infeasible actor UI so that it only shows infeasible for an ActorClassGroup if at least one actor in the class is infeasible

* lint
2020-10-29 23:12:43 -07:00
Max Fitton
d6628cdbfb
[Dashboard] Fix null gpu utilization (#11650)
* update dashboard to work if GPU utilization field is missing from GPU payload

* lint

* lint
2020-10-29 23:11:50 -07:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Max Fitton
cdca5af53b
Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton
0a9cc9cce5
Revert "remove .fake build files (#11478)" (#11488)
This reverts commit 3ed3dea004.
2020-10-19 18:48:32 -07:00
Max Fitton
3ed3dea004
remove .fake build files (#11478)
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-19 15:36:47 -07:00
Max Fitton
f500292d41
[Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Edward Oakes
798bd6a359
[dashboard] Add /api/cluster_status endpoint (#11456) 2020-10-19 11:00:47 -05:00
Max Fitton
cd9dcfca0d
[Dashboard] CPU/GPU usage details in actor pane (#11269) 2020-10-13 20:23:23 -05:00
fyrestone
defd41aad7
[Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
Max Fitton
ff6d412ad9
[Dashboard] Add API support for the logical view and machine view in new backend (#11012)
* Add API support for the logical view and machine view, which lean on datacenter in common.

* Update dashboard/datacenter.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Update dashboard/modules/logical_view/logical_view_head.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Address PR comments

* lint

* Add dashboard tests to CI build

* Fix integration issues

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
Co-authored-by: fyrestone <fyrestone@outlook.com>
2020-10-02 17:58:44 -07:00
Max Fitton
5a42ed1848
[Dashboard] Add support for new backend to existing front-end (#11013)
* Trying to commit on top of old code again

* address comment

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-02 12:46:47 -07:00
Max Fitton
6ed8459f25
[Dashboard] Add tune API to support tune tab in new backend (#11009)
* Add tune API to support tune tab in new backend

* Make requested changes

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-02 11:48:48 -07:00
Max Fitton
9a6d01ebf9
[Dashboard] Add utility functions for actor and memory APIs (#11011)
* Add actor and memory utility functions needed by upcoming logical view and memory view APIs

* Add a method to allow printing Dict custom class and add support for hot-reloading local dev environment.

* Address PR comments

* Add unit tests from test metrics to branch for new memory_utils module

* Add note about sorting / grouping

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-01 23:48:03 -07:00
Max Fitton
03c11d3b85
[Dashboard] Add Ray Config API to Reporter Head (#11010)
* Add Ray Config API to the backend for fetching config

* Address PR comments.

* Make reporter cache whole payload

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-29 17:57:49 -07:00
Max Fitton
825737adc5
[Dashboard] Add old dashboard front end to new dir so we can get a diff going. (#11113)
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-29 13:46:42 -07:00
Eric Liang
609c1b8acd
Start moving ray internal files to _private module (#10994) 2020-09-24 22:46:35 -07:00
fyrestone
50784e2496
[Dashboard] Dashboard node grouping (#10528)
* Add RAY_NODE_ID environment var to agent

* Node ralated data use node id as key

* ray.init() return node id; Pass test_reporter.py

* Fix lint & CI

* Fix comments

* Minor fixes

* Fix CI

* Add const to ClientID in AgentManager::Options

* Use fstring

* Add comments

* Fix lint

* Add test_multi_nodes_info

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-16 10:17:29 -07:00
Edward Oakes
523705ac0f
Fix new dashboard test process check (#10584) 2020-09-04 22:04:44 -05:00