Commit graph

188 commits

Author SHA1 Message Date
Tao Wang
56ee6ef55f
[GCS]only update states related fields when publish actor table data (#13448) 2021-01-28 11:12:57 +08:00
Ameer Haj Ali
b7dd7ddb52
deprecate useless fields in the cluster yaml. (#13637)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

* joblib strikes again on windows

* add ability to not start autoscaler/monitor

* a

* remove worker_default

* Remove default pod type from operator

* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types

* deprecate useless fields

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-01-23 12:06:51 -08:00
Tao Wang
aa5d7a5e6c
[Dashboard]Don't set node actors when node_id of actor is Nil (#13573)
* Don't set node actors when node_id of actor is Nil

* add test per comment
2021-01-21 20:18:34 -08:00
Xianyang Liu
4ecd29ea2b
[dashboard] Fixes dashboard issues when environments have set http_proxy (#12598)
* fixes ray start with http_proxy

* format

* fixes

* fixes

* increase timeout

* address comments
2021-01-21 20:10:01 -08:00
Simon Mo
dac8b3d58a
[CI] Enable Dashboard tests for master (#13425) 2021-01-15 09:43:34 -08:00
fyrestone
4853aa96cb
[Dashboard] Fix missing actor pid (#13229) 2021-01-13 16:45:12 +08:00
fyrestone
a6d135a072
[Dashboard] Add GET /log_proxy API (#13165) 2021-01-08 11:45:07 +08:00
SangBin Cho
32dc5676b4
[Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
fyrestone
6a54897577
Job module without submission (#13081)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-31 11:12:17 +08:00
Alex Wu
8df94e33e0
[Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
fyrestone
62a5832007
[Dashboard] Add GET /logical/actors API (#12913) 2020-12-23 11:14:23 +08:00
Eric Liang
03a5b90ed6
Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990) 2020-12-21 15:16:42 -08:00
Eric Liang
64c97d25d3
Enable by default new scheduler (#12735) 2020-12-19 13:22:24 -08:00
Eric Liang
5d987f5988
Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894)" (#12988)
This reverts commit 3e492a79ec.
2020-12-18 23:51:44 -08:00
Eric Liang
3e492a79ec
Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Max Fitton
d0813c1c58
[Dashboard] Add dashboard multi-node churn test (#11768) 2020-12-14 17:03:33 -06:00
Max Fitton
ac24d1db30
[Dashboard][Bugfix] Fix GPU List Bug (#12666)
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list

* lint

* dashboard/modules/logical_view

* fix test

* trigger build
2020-12-12 23:34:24 -08:00
Max Fitton
cc2f43c826
[Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard (#12660)
* Fix bug with worker logs/errors not displaying in the dashboard

* Add error endpoint test.

* lint
2020-12-07 21:41:13 -08:00
SangBin Cho
753cda2f28
[Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton
f545418c3f
[Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) 2020-11-11 14:55:54 -08:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Max Fitton
cdca5af53b
Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton
f500292d41
[Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Edward Oakes
798bd6a359
[dashboard] Add /api/cluster_status endpoint (#11456) 2020-10-19 11:00:47 -05:00
Max Fitton
cd9dcfca0d
[Dashboard] CPU/GPU usage details in actor pane (#11269) 2020-10-13 20:23:23 -05:00
fyrestone
defd41aad7
[Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
Max Fitton
ff6d412ad9
[Dashboard] Add API support for the logical view and machine view in new backend (#11012)
* Add API support for the logical view and machine view, which lean on datacenter in common.

* Update dashboard/datacenter.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Update dashboard/modules/logical_view/logical_view_head.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Address PR comments

* lint

* Add dashboard tests to CI build

* Fix integration issues

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
Co-authored-by: fyrestone <fyrestone@outlook.com>
2020-10-02 17:58:44 -07:00
Max Fitton
6ed8459f25
[Dashboard] Add tune API to support tune tab in new backend (#11009)
* Add tune API to support tune tab in new backend

* Make requested changes

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-02 11:48:48 -07:00
Max Fitton
03c11d3b85
[Dashboard] Add Ray Config API to Reporter Head (#11010)
* Add Ray Config API to the backend for fetching config

* Address PR comments.

* Make reporter cache whole payload

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-29 17:57:49 -07:00
Eric Liang
609c1b8acd
Start moving ray internal files to _private module (#10994) 2020-09-24 22:46:35 -07:00
fyrestone
50784e2496
[Dashboard] Dashboard node grouping (#10528)
* Add RAY_NODE_ID environment var to agent

* Node ralated data use node id as key

* ray.init() return node id; Pass test_reporter.py

* Fix lint & CI

* Fix comments

* Minor fixes

* Fix CI

* Add const to ClientID in AgentManager::Options

* Use fstring

* Add comments

* Fix lint

* Add test_multi_nodes_info

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-16 10:17:29 -07:00
fyrestone
e9b046306a
[Dashboard] Dashboard basic modules (#10303)
* Improve reporter module

* Add test_node_physical_stats to test_reporter.py

* Add test_class_method_route_table to test_dashboard.py

* Add stats_collector module for dashboard

* Subscribe actor table data

* Add log module for dashboard

* Only enable test module in some test cases

* CI run all dashboard tests

* Reduce test timeout to 10s

* Use fstring

* Remove unused code

* Remove blank line

* Fix dashboard tests

* Fix asyncio.create_task not available in py36; Fix lint

* Add format_web_url to ray.test_utils

* Update dashboard/modules/reporter/reporter_head.py

Co-authored-by: Max Fitton <mfitton@berkeley.edu>

* Add DictChangeItem type for Dict change

* Refine logger.exception

* Refine GET /api/launch_profiling

* Remove disable_test_module fixture

* Fix test_basic may fail

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
2020-08-29 23:09:34 -07:00
Ian Rodney
d6f2b0d933
[docker] Run profiling without sudo (#10388)
* fix profiling for docker

* small fixes

* use name

* do not import pwd on windows
2020-08-28 21:25:10 -07:00
fyrestone
05c103af94
[Dashboard] Start the new dashboard (#10131)
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup

* Make fake client/build/static directory for dashboard

* Add test_dashboard.py for new dashboard

* Travis CI enable new dashboard test

* Update new dashboard

* Agent manager service

* Add agent manager

* Register agent to agent manager

* Add a new line to the end of agent_manager.cc

* Fix merge; Fix lint

* Update dashboard/agent.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update dashboard/head.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Fix bug

* Add tests for dashboard

* Fix

* Remove const from Process::Kill() & Fix bugs

* Revert error check of execute_after

* Raise exception from DashboardAgent.run

* Add more tests.

* Fix compile on Linux

* Use dict comprehension instead of dict(generator)

* Fix lint

* Fix windows compile

* Fix lint

* Test Windows CI

* Revert "Test Windows CI"

This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.

* Fix ParseWindowsCommandLine bug

* Update src/ray/util/util.cc

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard (#9860)" (#10116)
This reverts commit 739933e5b8.
2020-08-14 14:06:57 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard (#9860) 2020-08-13 11:01:46 +08:00
fyrestone
4d08ddbf24
[Dashboard] New dashboard skeleton (#9099) 2020-07-27 11:34:47 +08:00