Commit graph

2034 commits

Author SHA1 Message Date
SangBin Cho
b7c56b8a71
[Core] Improve the server startup error message. (#14267)
* Improve the error message further.

* fix comment.

* Fix comment 2.

* improve messages to be even more high level.

* Address code review.
2021-02-23 16:26:06 -08:00
DK.Pino
911b028c54
[Placement Group] Make the creation of placement group sync (#13858)
* make pg creation sync

* return successful immediately when pg registeration

* hold on

* fix ut

* make collection for callback

* make pg registration vector

* fix new cpp ut

* fix named py ut

* fix python ut bug

* fix python ut

* fix lint

* modify comment

* fix comment

* fix comment

* add new ut and fix old lint issue

* fix comment

* update comment

* fix conflict
2021-02-23 16:11:43 -08:00
Clark Zinzow
d344e77109
Revert "Revert "Inline small objects in GetObjectStatus response. (#13309)" (#13615)" (#13618)
This reverts commit 20acc3b05e.
2021-02-23 12:06:37 -08:00
Simon Mo
dfd5eb4b0d
[Core] fix gcs use-after-free from ASAN (#14199) 2021-02-23 10:37:31 -08:00
ZhuSenlin
8be107196d
fix retry leasing worker (#14272) 2021-02-23 19:38:40 +08:00
Clark Zinzow
5ce9b93f47
[Core] Ownership-based Object Directory - Enabled by default (#14254) 2021-02-22 22:09:41 -08:00
Alex Wu
79653049d2
[core] Start less worker processes (#14202) 2021-02-22 22:01:38 -08:00
ZhuSenlin
8e0b2d07f4
[Core] synchronize job config to worker when it registers to raylet (#13402) 2021-02-23 11:48:54 +08:00
DK.Pino
7647d60fa9
[Placement Group] Support named placement group java api & Refactor construct method (#13821) 2021-02-22 20:12:09 +08:00
Kai Yang
e75b143faf
[Core] Some small fixes and improvements (#14210) 2021-02-22 12:02:30 +08:00
Kai Yang
d8c32be449
[Core] Simplify system config passing from Raylet to workers (#13860) 2021-02-20 20:20:13 +08:00
Stephanie Wang
a4d7792c0e
[core] Fix bugs in admission control again (#14222)
* Track which pull bundle requests are ready to run

* Regression test

* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation

* Revert "Track which pull bundle requests are ready to run"

This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.

* Check object active before receiving chunk

* lint

* debug, unit test, fix race condition

* lint

* update

* lint

* fix

* fix build

* fix test

* remove print

* Fix bug in bytes accounting

* Split
2021-02-19 18:07:57 -08:00
Eric Liang
58f8c4b23a
Handle unhandled exception handler == nullptr in Java (#14221) 2021-02-19 16:54:41 -08:00
SangBin Cho
296792f963
Revert "[core] Fix bugs in admission control (#14157)" (#14217)
This reverts commit 94a819d00e.
2021-02-19 11:58:17 -08:00
Eric Liang
cc156f7b3c
Fix deadlock in unhandled exception handler and re-merge (#3) (#14192) 2021-02-19 11:52:09 -08:00
Kai Yang
ec344b87c7
[Core] Fix grpc server is started check (#14183) 2021-02-19 16:48:28 +08:00
Stephanie Wang
94a819d00e
[core] Fix bugs in admission control (#14157)
* Track which pull bundle requests are ready to run

* Regression test

* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation

* Revert "Track which pull bundle requests are ready to run"

This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.

* Check object active before receiving chunk

* lint

* debug, unit test, fix race condition

* lint

* update

* lint

* fix

* fix build

* fix test

* remove print

* Fix bug in bytes accounting
2021-02-18 20:39:00 -08:00
Clark Zinzow
c092a5d184
Cancel object location long-poll on object free. (#14165) 2021-02-18 14:09:43 -08:00
Stephanie Wang
dfb86e0a8f
[core] Push object chunks with multiple threads (#14191)
* Push object chunks with multiple threads

* fix build
2021-02-18 14:09:23 -08:00
SangBin Cho
66f93a3d63
Revert "Fix OSX error and re-merge unhandled exceptions handling (#14138)" (#14180)
This reverts commit ee584e8328.
2021-02-18 10:35:38 -08:00
SangBin Cho
9451b4ea86
[Object Spilling] Fix the race condition. (#14149)
* Fix the race condition.

* done.

* Fix the lint issu.e

* fix issues. addressed comments.
2021-02-17 14:35:22 -08:00
Eric Liang
ee584e8328
Fix OSX error and re-merge unhandled exceptions handling (#14138) 2021-02-17 13:35:07 -08:00
SangBin Cho
3a6a977803
Revert "[Ownership based object directory] Turn on by default. (#13964)" (#14148)
This reverts commit 04d2df40cd.
2021-02-16 22:42:58 -08:00
architkulkarni
d9124e9329
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug (#14025)" (#14146)
This reverts commit 1754359281.
2021-02-16 17:22:25 -08:00
SangBin Cho
04d2df40cd
[Ownership based object directory] Turn on by default. (#13964) 2021-02-16 17:16:44 -08:00
SangBin Cho
1b1420e069
[Scheduler] Fix spillback is done deterministically. (#14096)
* update.

* Fix comments.

* Addressed code review.

* fix a test.

* Addressed last code review.

* d.

* done.
2021-02-16 16:46:16 -08:00
SangBin Cho
5135661bdf
[Metrics] Add spilling stats (#14103)
* Add stats for object spilling.

* Formatting.

* addressed code review.
2021-02-16 15:26:04 -08:00
architkulkarni
3ce03a52bc
Revert "Revert "Revert "Unhandled exception handler based on local ref counti… (#14113)" (#14136)
This reverts commit e457872fe1.
2021-02-16 11:47:09 -08:00
Barak Michener
c43a64230e
[ray_client]: Fix mutual recursion (#14122) 2021-02-16 10:37:58 -08:00
SangBin Cho
4ad79ca963
[Object Spilling] Remove LRU eviction (#13977)
* done.

* formatting.

* done.

* done.
2021-02-15 14:24:53 -08:00
Eric Liang
e457872fe1
Revert "Revert "Unhandled exception handler based on local ref counti… (#14113)
* Revert "Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)"

This reverts commit b45ae76765.

* reomve test

* fix

* fix
2021-02-15 14:11:11 -08:00
SangBin Cho
b45ae76765
Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)
This reverts commit 9dc671ae02.
2021-02-14 22:08:32 -08:00
Alex Wu
5636af8084
[hotfix] Fix mac build (#14075)
* .

* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-02-14 14:26:51 -08:00
Eric Liang
9dc671ae02
Unhandled exception handler based on local ref counting (#14049) 2021-02-12 22:58:38 -08:00
Clark Zinzow
c7ff69f4bf
[OBOD] Add support for ownership-based object directory object recovery. (#14066) 2021-02-12 11:58:31 -08:00
Clark Zinzow
cd7e567a57
[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. (#13948)
* Add support for object spilling in the ownership-based object directory.

* Move owner address hashmap into pinned_objects_ and objects_pending_spill_.

* Update local object manager tests.

* Feedback and misc. fixes.

* Move spilled unpin callback lambda to std::binded private method.

* Skip test_delete_objects_multi_node test on MacOS for now.
2021-02-11 10:36:22 -08:00
Ameer Haj Ali
d87a82e891
Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)" (#14050)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)"

This reverts commit 6f9d39fb3e.

* fake news

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-10 17:59:08 -08:00
Stephanie Wang
fc89984162
Subtract from num bytes in use (#13944) 2021-02-10 12:22:08 -08:00
architkulkarni
6f9d39fb3e
Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)
This reverts commit 7a6f8054d1.
2021-02-10 12:16:52 -08:00
fangfengbin
1754359281
[Core]Fix ray.kill doesn't cancel pending actor bug (#14025) 2021-02-10 15:30:21 +08:00
Ameer Haj Ali
7a6f8054d1
[Autoscaler] Monitor refactor for backward compatability. (#13970) 2021-02-09 21:41:50 -08:00
Kai Yang
e0b81796c5
Revert "Revert "[Java] fix test hang occasionally when running FailureTest (#13934)" (#13992)" (#14008) 2021-02-09 12:43:26 -08:00
Simon Mo
f51c26bae6
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug (#13254)" (#14013)
This reverts commit 2092b097ea.
2021-02-09 11:36:38 -08:00
fangfengbin
2092b097ea
[Core]Fix ray.kill doesn't cancel pending actor bug (#13254) 2021-02-09 10:59:14 +08:00
Simon Mo
ec94214957
Revert "[Java] fix test hang occasionally when running FailureTest (#13934)" (#13992)
This reverts commit bcf9457abb.
2021-02-08 11:30:30 -08:00
Kai Yang
bcf9457abb
[Java] fix test hang occasionally when running FailureTest (#13934) 2021-02-08 18:21:50 +08:00
Kai Yang
4b4941435d
[Java] fix actor restart failure when multi-worker is turned on (#13793) 2021-02-07 21:12:54 +08:00
Simon Mo
ea4154df80
[Hotfix] Master compilation error on MacOS. (#13946) 2021-02-05 16:07:45 -08:00
fyrestone
eee624cf5f
Revert "Fix passing env on windows (#13253)" (#13828) 2021-02-05 13:03:16 +08:00
fangfengbin
8a5999c12a
[GCS]Fix bug that gcs client does not set last_resource_usage_ (#13856) 2021-02-05 11:51:25 +08:00