SangBin Cho
190ab40645
[Core] Display ip address when node dies ( #14489 )
...
* done.
* Addressed code review.
2021-03-04 10:27:00 -08:00
Kai Yang
1d7bd990b6
[Java] Update System.gc() log to debug level ( #14490 )
2021-03-04 18:54:10 +08:00
Kai Yang
5d79821e69
[Core] Initialize system config in CoreWorkerProcess constructor ( #14439 )
2021-03-04 16:34:54 +08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends ( #14184 )
2021-03-03 18:29:15 -08:00
ZhuSenlin
dcff25aed6
remove invalid code inside NodeManager::NodeAdded ( #14273 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antgroup.com>
2021-03-03 09:20:21 -08:00
Kai Yang
c53c909130
[Java] Quit worker process after RunTaskExecutionLoop to avoid orphan Java worker processses ( #14442 )
2021-03-03 16:47:17 +08:00
fangfengbin
1054613da1
[Core]Fix ray.kill doesn't cancel pending actor bug ( #14154 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-03-03 16:12:32 +08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue ( #14288 )
...
* Spill back waiting tasks
* test
* test
* todo
* Avoid iterating over args
* update
* lint
* Fix test
* test
* Test force spillback
* Unit test resource scheduler
* test
* travis?
* rename
* debug
* revert flaky test
* lint
* fix test
* fix
2021-03-02 22:30:02 -08:00
SangBin Cho
bacbdd297b
[Core] Do not unregister workers that own objects by worker capping mechanism. ( #14408 )
...
* Almost done.
* Initial implementation done.
* Fix issue.
* Addressed the initial code review.
* improve comments.
* Addressed code review.
* Adding unit tests.
* Complete unit tests.
* Resolve all issues.
* Fix issues.
2021-03-02 12:24:22 -08:00
Yi Cheng
d921dca075
[core] Fixing bug when dispatching tasks to deleted placement group ( #14300 )
2021-03-02 10:24:53 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions ( #14358 )
...
* Randomize actor ID
* Mix index and current time, add python test
* test
* nanos
2021-03-02 10:00:28 -08:00
Tao Wang
2de01ee3b1
[GCS]Cherry pick heartbeat function into another thread ( #14301 )
2021-03-02 17:49:02 +08:00
SangBin Cho
09fd38ede1
[Multi node shuffle] More efficient ray memory --stats-only ( #14423 )
...
* Done.
* Fix all the issues.
2021-03-01 23:14:06 -08:00
SangBin Cho
0ec8efbb47
[Core] Minor fixes ( #14411 )
...
* Fix issue.
* Lint.
* Addressed code review.
2021-03-01 18:37:05 -08:00
Eric Liang
9db000ff2c
Auto report object store memory usage; remove some deprecated code ( #14260 )
2021-03-01 13:19:44 -08:00
Qing Wang
f7f64e90ed
[Minor] Remove unused field. ( #14382 )
...
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-03-01 19:35:28 +08:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port ( #14349 )
2021-02-27 20:27:09 +08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args ( #14329 )
2021-02-26 12:42:58 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. ( #14296 )
...
* Adds locality-aware leasing for borrowed refs.
* Added tests.
2021-02-26 10:36:12 -08:00
Richard Liaw
3e9ff91218
Revert the reverted heartbeat factor PR (check windows build) ( #14341 )
2021-02-25 20:52:12 -08:00
Eric Liang
adbdacae58
add more io workers ( #14330 )
2021-02-24 22:00:31 -08:00
Clark Zinzow
c1a1be1da6
[Core] Locality-aware leasing: Milestone 2 - Owned refs, cached locations ( #14282 )
...
* Adds locality-aware leasing for cached owned refs.
* Add tests for locality-aware leasing on cached owned refs.
2021-02-24 21:24:10 -08:00
Richard Liaw
80657e5dfe
Revert "[Core]Pull off timers out of heartbeat in raylet ( #13963 )" ( #14319 )
2021-02-24 19:44:31 -08:00
ZhuSenlin
be28e8fae4
use iterator to instead of operator[] to avoid garbage ( #14275 )
2021-02-25 11:37:36 +08:00
fangfengbin
482a00278b
[GCS]Fix flaky testcase: ServiceBasedGcsClientTest ( #14248 )
2021-02-24 20:35:30 +08:00
Tao Wang
6af0291347
[Core]Pull off timers out of heartbeat in raylet ( #13963 )
2021-02-24 11:59:13 +08:00
SangBin Cho
b7c56b8a71
[Core] Improve the server startup error message. ( #14267 )
...
* Improve the error message further.
* fix comment.
* Fix comment 2.
* improve messages to be even more high level.
* Address code review.
2021-02-23 16:26:06 -08:00
DK.Pino
911b028c54
[Placement Group] Make the creation of placement group sync ( #13858 )
...
* make pg creation sync
* return successful immediately when pg registeration
* hold on
* fix ut
* make collection for callback
* make pg registration vector
* fix new cpp ut
* fix named py ut
* fix python ut bug
* fix python ut
* fix lint
* modify comment
* fix comment
* fix comment
* add new ut and fix old lint issue
* fix comment
* update comment
* fix conflict
2021-02-23 16:11:43 -08:00
Clark Zinzow
d344e77109
Revert "Revert "Inline small objects in GetObjectStatus response. ( #13309 )" ( #13615 )" ( #13618 )
...
This reverts commit 20acc3b05e
.
2021-02-23 12:06:37 -08:00
Simon Mo
dfd5eb4b0d
[Core] fix gcs use-after-free from ASAN ( #14199 )
2021-02-23 10:37:31 -08:00
ZhuSenlin
8be107196d
fix retry leasing worker ( #14272 )
2021-02-23 19:38:40 +08:00
Clark Zinzow
5ce9b93f47
[Core] Ownership-based Object Directory - Enabled by default ( #14254 )
2021-02-22 22:09:41 -08:00
Alex Wu
79653049d2
[core] Start less worker processes ( #14202 )
2021-02-22 22:01:38 -08:00
ZhuSenlin
8e0b2d07f4
[Core] synchronize job config to worker when it registers to raylet ( #13402 )
2021-02-23 11:48:54 +08:00
DK.Pino
7647d60fa9
[Placement Group] Support named placement group java api & Refactor construct method ( #13821 )
2021-02-22 20:12:09 +08:00
Kai Yang
e75b143faf
[Core] Some small fixes and improvements ( #14210 )
2021-02-22 12:02:30 +08:00
Kai Yang
d8c32be449
[Core] Simplify system config passing from Raylet to workers ( #13860 )
2021-02-20 20:20:13 +08:00
Stephanie Wang
a4d7792c0e
[core] Fix bugs in admission control again ( #14222 )
...
* Track which pull bundle requests are ready to run
* Regression test
* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation
* Revert "Track which pull bundle requests are ready to run"
This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.
* Check object active before receiving chunk
* lint
* debug, unit test, fix race condition
* lint
* update
* lint
* fix
* fix build
* fix test
* remove print
* Fix bug in bytes accounting
* Split
2021-02-19 18:07:57 -08:00
Eric Liang
58f8c4b23a
Handle unhandled exception handler == nullptr in Java ( #14221 )
2021-02-19 16:54:41 -08:00
SangBin Cho
296792f963
Revert "[core] Fix bugs in admission control ( #14157 )" ( #14217 )
...
This reverts commit 94a819d00e
.
2021-02-19 11:58:17 -08:00
Eric Liang
cc156f7b3c
Fix deadlock in unhandled exception handler and re-merge ( #3 ) ( #14192 )
2021-02-19 11:52:09 -08:00
Kai Yang
ec344b87c7
[Core] Fix grpc server is started check ( #14183 )
2021-02-19 16:48:28 +08:00
Stephanie Wang
94a819d00e
[core] Fix bugs in admission control ( #14157 )
...
* Track which pull bundle requests are ready to run
* Regression test
* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation
* Revert "Track which pull bundle requests are ready to run"
This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.
* Check object active before receiving chunk
* lint
* debug, unit test, fix race condition
* lint
* update
* lint
* fix
* fix build
* fix test
* remove print
* Fix bug in bytes accounting
2021-02-18 20:39:00 -08:00
Clark Zinzow
c092a5d184
Cancel object location long-poll on object free. ( #14165 )
2021-02-18 14:09:43 -08:00
Stephanie Wang
dfb86e0a8f
[core] Push object chunks with multiple threads ( #14191 )
...
* Push object chunks with multiple threads
* fix build
2021-02-18 14:09:23 -08:00
SangBin Cho
66f93a3d63
Revert "Fix OSX error and re-merge unhandled exceptions handling ( #14138 )" ( #14180 )
...
This reverts commit ee584e8328
.
2021-02-18 10:35:38 -08:00
SangBin Cho
9451b4ea86
[Object Spilling] Fix the race condition. ( #14149 )
...
* Fix the race condition.
* done.
* Fix the lint issu.e
* fix issues. addressed comments.
2021-02-17 14:35:22 -08:00
Eric Liang
ee584e8328
Fix OSX error and re-merge unhandled exceptions handling ( #14138 )
2021-02-17 13:35:07 -08:00
SangBin Cho
3a6a977803
Revert "[Ownership based object directory] Turn on by default. ( #13964 )" ( #14148 )
...
This reverts commit 04d2df40cd
.
2021-02-16 22:42:58 -08:00
architkulkarni
d9124e9329
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug ( #14025 )" ( #14146 )
...
This reverts commit 1754359281
.
2021-02-16 17:22:25 -08:00