Current logic looks broken, as reported in #22954 (comment)
I fixed the logic as best as I can, and tested it on Anyscale platform with GPU. No process info was reported from gpustat. But the logic works under this case.
Currently, GCS KV client only has blocking API. Calling them from dashboard event loop can block other operations for many seconds, leading to failures such as taking too long (> 2min) to submit a job and making nightly tests fail (#21699). This PR offloads the blocking work to a separate thread. Implementing async GCS KV API will be done in the future.
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list
* lint
* dashboard/modules/logical_view
* fix test
* trigger build
* Add API support for the logical view and machine view, which lean on datacenter in common.
* Update dashboard/datacenter.py
Co-authored-by: fyrestone <fyrestone@outlook.com>
* Update dashboard/modules/logical_view/logical_view_head.py
Co-authored-by: fyrestone <fyrestone@outlook.com>
* Address PR comments
* lint
* Add dashboard tests to CI build
* Fix integration issues
* lint
Co-authored-by: Max Fitton <max@semprehealth.com>
Co-authored-by: fyrestone <fyrestone@outlook.com>
* Add RAY_NODE_ID environment var to agent
* Node ralated data use node id as key
* ray.init() return node id; Pass test_reporter.py
* Fix lint & CI
* Fix comments
* Minor fixes
* Fix CI
* Add const to ClientID in AgentManager::Options
* Use fstring
* Add comments
* Fix lint
* Add test_multi_nodes_info
Co-authored-by: 刘宝 <po.lb@antfin.com>
* Improve reporter module
* Add test_node_physical_stats to test_reporter.py
* Add test_class_method_route_table to test_dashboard.py
* Add stats_collector module for dashboard
* Subscribe actor table data
* Add log module for dashboard
* Only enable test module in some test cases
* CI run all dashboard tests
* Reduce test timeout to 10s
* Use fstring
* Remove unused code
* Remove blank line
* Fix dashboard tests
* Fix asyncio.create_task not available in py36; Fix lint
* Add format_web_url to ray.test_utils
* Update dashboard/modules/reporter/reporter_head.py
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
* Add DictChangeItem type for Dict change
* Refine logger.exception
* Refine GET /api/launch_profiling
* Remove disable_test_module fixture
* Fix test_basic may fail
Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>