Commit graph

23 commits

Author SHA1 Message Date
Simon Mo
33316d4f8f
Revert "Export additional metrics to Prometheus (#14061)" (#14134)
This reverts commit 82539f2da4.
2021-02-16 12:49:12 -08:00
Kathryn Zhou
82539f2da4
Export additional metrics to Prometheus (#14061) 2021-02-14 23:16:26 -08:00
Ameer Haj Ali
b7dd7ddb52
deprecate useless fields in the cluster yaml. (#13637)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

* joblib strikes again on windows

* add ability to not start autoscaler/monitor

* a

* remove worker_default

* Remove default pod type from operator

* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types

* deprecate useless fields

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-01-23 12:06:51 -08:00
Xianyang Liu
4ecd29ea2b
[dashboard] Fixes dashboard issues when environments have set http_proxy (#12598)
* fixes ray start with http_proxy

* format

* fixes

* fixes

* increase timeout

* address comments
2021-01-21 20:10:01 -08:00
SangBin Cho
32dc5676b4
[Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
Alex Wu
8df94e33e0
[Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
SangBin Cho
753cda2f28
[Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Max Fitton
cdca5af53b
Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton
f500292d41
[Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Edward Oakes
798bd6a359
[dashboard] Add /api/cluster_status endpoint (#11456) 2020-10-19 11:00:47 -05:00
Max Fitton
cd9dcfca0d
[Dashboard] CPU/GPU usage details in actor pane (#11269) 2020-10-13 20:23:23 -05:00
fyrestone
defd41aad7
[Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
Max Fitton
ff6d412ad9
[Dashboard] Add API support for the logical view and machine view in new backend (#11012)
* Add API support for the logical view and machine view, which lean on datacenter in common.

* Update dashboard/datacenter.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Update dashboard/modules/logical_view/logical_view_head.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Address PR comments

* lint

* Add dashboard tests to CI build

* Fix integration issues

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
Co-authored-by: fyrestone <fyrestone@outlook.com>
2020-10-02 17:58:44 -07:00
Max Fitton
03c11d3b85
[Dashboard] Add Ray Config API to Reporter Head (#11010)
* Add Ray Config API to the backend for fetching config

* Address PR comments.

* Make reporter cache whole payload

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-29 17:57:49 -07:00
Eric Liang
609c1b8acd
Start moving ray internal files to _private module (#10994) 2020-09-24 22:46:35 -07:00
fyrestone
50784e2496
[Dashboard] Dashboard node grouping (#10528)
* Add RAY_NODE_ID environment var to agent

* Node ralated data use node id as key

* ray.init() return node id; Pass test_reporter.py

* Fix lint & CI

* Fix comments

* Minor fixes

* Fix CI

* Add const to ClientID in AgentManager::Options

* Use fstring

* Add comments

* Fix lint

* Add test_multi_nodes_info

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-16 10:17:29 -07:00
fyrestone
e9b046306a
[Dashboard] Dashboard basic modules (#10303)
* Improve reporter module

* Add test_node_physical_stats to test_reporter.py

* Add test_class_method_route_table to test_dashboard.py

* Add stats_collector module for dashboard

* Subscribe actor table data

* Add log module for dashboard

* Only enable test module in some test cases

* CI run all dashboard tests

* Reduce test timeout to 10s

* Use fstring

* Remove unused code

* Remove blank line

* Fix dashboard tests

* Fix asyncio.create_task not available in py36; Fix lint

* Add format_web_url to ray.test_utils

* Update dashboard/modules/reporter/reporter_head.py

Co-authored-by: Max Fitton <mfitton@berkeley.edu>

* Add DictChangeItem type for Dict change

* Refine logger.exception

* Refine GET /api/launch_profiling

* Remove disable_test_module fixture

* Fix test_basic may fail

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
2020-08-29 23:09:34 -07:00
Ian Rodney
d6f2b0d933
[docker] Run profiling without sudo (#10388)
* fix profiling for docker

* small fixes

* use name

* do not import pwd on windows
2020-08-28 21:25:10 -07:00
fyrestone
05c103af94
[Dashboard] Start the new dashboard (#10131)
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup

* Make fake client/build/static directory for dashboard

* Add test_dashboard.py for new dashboard

* Travis CI enable new dashboard test

* Update new dashboard

* Agent manager service

* Add agent manager

* Register agent to agent manager

* Add a new line to the end of agent_manager.cc

* Fix merge; Fix lint

* Update dashboard/agent.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update dashboard/head.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Fix bug

* Add tests for dashboard

* Fix

* Remove const from Process::Kill() & Fix bugs

* Revert error check of execute_after

* Raise exception from DashboardAgent.run

* Add more tests.

* Fix compile on Linux

* Use dict comprehension instead of dict(generator)

* Fix lint

* Fix windows compile

* Fix lint

* Test Windows CI

* Revert "Test Windows CI"

This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.

* Fix ParseWindowsCommandLine bug

* Update src/ray/util/util.cc

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard (#9860)" (#10116)
This reverts commit 739933e5b8.
2020-08-14 14:06:57 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard (#9860) 2020-08-13 11:01:46 +08:00
fyrestone
4d08ddbf24
[Dashboard] New dashboard skeleton (#9099) 2020-07-27 11:34:47 +08:00