Commit graph

1910 commits

Author SHA1 Message Date
SangBin Cho
190ab40645
[Core] Display ip address when node dies (#14489)
* done.

* Addressed code review.
2021-03-04 10:27:00 -08:00
Kai Yang
1d7bd990b6
[Java] Update System.gc() log to debug level (#14490) 2021-03-04 18:54:10 +08:00
Kai Yang
5d79821e69
[Core] Initialize system config in CoreWorkerProcess constructor (#14439) 2021-03-04 16:34:54 +08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends (#14184) 2021-03-03 18:29:15 -08:00
ZhuSenlin
dcff25aed6
remove invalid code inside NodeManager::NodeAdded (#14273)
Co-authored-by: senlin.zsl <senlin.zsl@antgroup.com>
2021-03-03 09:20:21 -08:00
Kai Yang
c53c909130
[Java] Quit worker process after RunTaskExecutionLoop to avoid orphan Java worker processses (#14442) 2021-03-03 16:47:17 +08:00
fangfengbin
1054613da1
[Core]Fix ray.kill doesn't cancel pending actor bug (#14154)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-03-03 16:12:32 +08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue (#14288)
* Spill back waiting tasks

* test

* test

* todo

* Avoid iterating over args

* update

* lint

* Fix test

* test

* Test force spillback

* Unit test resource scheduler

* test

* travis?

* rename

* debug

* revert flaky test

* lint

* fix test

* fix
2021-03-02 22:30:02 -08:00
SangBin Cho
bacbdd297b
[Core] Do not unregister workers that own objects by worker capping mechanism. (#14408)
* Almost done.

* Initial implementation done.

* Fix issue.

* Addressed the initial code review.

* improve comments.

* Addressed code review.

* Adding unit tests.

* Complete unit tests.

* Resolve all issues.

* Fix issues.
2021-03-02 12:24:22 -08:00
Yi Cheng
d921dca075
[core] Fixing bug when dispatching tasks to deleted placement group (#14300) 2021-03-02 10:24:53 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions (#14358)
* Randomize actor ID

* Mix index and current time, add python test

* test

* nanos
2021-03-02 10:00:28 -08:00
Tao Wang
2de01ee3b1
[GCS]Cherry pick heartbeat function into another thread (#14301) 2021-03-02 17:49:02 +08:00
SangBin Cho
09fd38ede1
[Multi node shuffle] More efficient ray memory --stats-only (#14423)
* Done.

* Fix all the issues.
2021-03-01 23:14:06 -08:00
SangBin Cho
0ec8efbb47
[Core] Minor fixes (#14411)
* Fix issue.

* Lint.

* Addressed code review.
2021-03-01 18:37:05 -08:00
Eric Liang
9db000ff2c
Auto report object store memory usage; remove some deprecated code (#14260) 2021-03-01 13:19:44 -08:00
Qing Wang
f7f64e90ed
[Minor] Remove unused field. (#14382)
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-03-01 19:35:28 +08:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port (#14349) 2021-02-27 20:27:09 +08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args (#14329) 2021-02-26 12:42:58 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. (#14296)
* Adds locality-aware leasing for borrowed refs.

* Added tests.
2021-02-26 10:36:12 -08:00
Richard Liaw
3e9ff91218
Revert the reverted heartbeat factor PR (check windows build) (#14341) 2021-02-25 20:52:12 -08:00
Eric Liang
adbdacae58
add more io workers (#14330) 2021-02-24 22:00:31 -08:00
Clark Zinzow
c1a1be1da6
[Core] Locality-aware leasing: Milestone 2 - Owned refs, cached locations (#14282)
* Adds locality-aware leasing for cached owned refs.

* Add tests for locality-aware leasing on cached owned refs.
2021-02-24 21:24:10 -08:00
Richard Liaw
80657e5dfe
Revert "[Core]Pull off timers out of heartbeat in raylet (#13963)" (#14319) 2021-02-24 19:44:31 -08:00
ZhuSenlin
be28e8fae4
use iterator to instead of operator[] to avoid garbage (#14275) 2021-02-25 11:37:36 +08:00
fangfengbin
482a00278b
[GCS]Fix flaky testcase: ServiceBasedGcsClientTest (#14248) 2021-02-24 20:35:30 +08:00
Tao Wang
6af0291347
[Core]Pull off timers out of heartbeat in raylet (#13963) 2021-02-24 11:59:13 +08:00
SangBin Cho
b7c56b8a71
[Core] Improve the server startup error message. (#14267)
* Improve the error message further.

* fix comment.

* Fix comment 2.

* improve messages to be even more high level.

* Address code review.
2021-02-23 16:26:06 -08:00
DK.Pino
911b028c54
[Placement Group] Make the creation of placement group sync (#13858)
* make pg creation sync

* return successful immediately when pg registeration

* hold on

* fix ut

* make collection for callback

* make pg registration vector

* fix new cpp ut

* fix named py ut

* fix python ut bug

* fix python ut

* fix lint

* modify comment

* fix comment

* fix comment

* add new ut and fix old lint issue

* fix comment

* update comment

* fix conflict
2021-02-23 16:11:43 -08:00
Clark Zinzow
d344e77109
Revert "Revert "Inline small objects in GetObjectStatus response. (#13309)" (#13615)" (#13618)
This reverts commit 20acc3b05e.
2021-02-23 12:06:37 -08:00
Simon Mo
dfd5eb4b0d
[Core] fix gcs use-after-free from ASAN (#14199) 2021-02-23 10:37:31 -08:00
ZhuSenlin
8be107196d
fix retry leasing worker (#14272) 2021-02-23 19:38:40 +08:00
Clark Zinzow
5ce9b93f47
[Core] Ownership-based Object Directory - Enabled by default (#14254) 2021-02-22 22:09:41 -08:00
Alex Wu
79653049d2
[core] Start less worker processes (#14202) 2021-02-22 22:01:38 -08:00
ZhuSenlin
8e0b2d07f4
[Core] synchronize job config to worker when it registers to raylet (#13402) 2021-02-23 11:48:54 +08:00
DK.Pino
7647d60fa9
[Placement Group] Support named placement group java api & Refactor construct method (#13821) 2021-02-22 20:12:09 +08:00
Kai Yang
e75b143faf
[Core] Some small fixes and improvements (#14210) 2021-02-22 12:02:30 +08:00
Kai Yang
d8c32be449
[Core] Simplify system config passing from Raylet to workers (#13860) 2021-02-20 20:20:13 +08:00
Stephanie Wang
a4d7792c0e
[core] Fix bugs in admission control again (#14222)
* Track which pull bundle requests are ready to run

* Regression test

* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation

* Revert "Track which pull bundle requests are ready to run"

This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.

* Check object active before receiving chunk

* lint

* debug, unit test, fix race condition

* lint

* update

* lint

* fix

* fix build

* fix test

* remove print

* Fix bug in bytes accounting

* Split
2021-02-19 18:07:57 -08:00
Eric Liang
58f8c4b23a
Handle unhandled exception handler == nullptr in Java (#14221) 2021-02-19 16:54:41 -08:00
SangBin Cho
296792f963
Revert "[core] Fix bugs in admission control (#14157)" (#14217)
This reverts commit 94a819d00e.
2021-02-19 11:58:17 -08:00
Eric Liang
cc156f7b3c
Fix deadlock in unhandled exception handler and re-merge (#3) (#14192) 2021-02-19 11:52:09 -08:00
Kai Yang
ec344b87c7
[Core] Fix grpc server is started check (#14183) 2021-02-19 16:48:28 +08:00
Stephanie Wang
94a819d00e
[core] Fix bugs in admission control (#14157)
* Track which pull bundle requests are ready to run

* Regression test

* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation

* Revert "Track which pull bundle requests are ready to run"

This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.

* Check object active before receiving chunk

* lint

* debug, unit test, fix race condition

* lint

* update

* lint

* fix

* fix build

* fix test

* remove print

* Fix bug in bytes accounting
2021-02-18 20:39:00 -08:00
Clark Zinzow
c092a5d184
Cancel object location long-poll on object free. (#14165) 2021-02-18 14:09:43 -08:00
Stephanie Wang
dfb86e0a8f
[core] Push object chunks with multiple threads (#14191)
* Push object chunks with multiple threads

* fix build
2021-02-18 14:09:23 -08:00
SangBin Cho
66f93a3d63
Revert "Fix OSX error and re-merge unhandled exceptions handling (#14138)" (#14180)
This reverts commit ee584e8328.
2021-02-18 10:35:38 -08:00
SangBin Cho
9451b4ea86
[Object Spilling] Fix the race condition. (#14149)
* Fix the race condition.

* done.

* Fix the lint issu.e

* fix issues. addressed comments.
2021-02-17 14:35:22 -08:00
Eric Liang
ee584e8328
Fix OSX error and re-merge unhandled exceptions handling (#14138) 2021-02-17 13:35:07 -08:00
SangBin Cho
3a6a977803
Revert "[Ownership based object directory] Turn on by default. (#13964)" (#14148)
This reverts commit 04d2df40cd.
2021-02-16 22:42:58 -08:00
architkulkarni
d9124e9329
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug (#14025)" (#14146)
This reverts commit 1754359281.
2021-02-16 17:22:25 -08:00