Ian Rodney
bd641a5e71
Revert "[Core] Added event loop metrics for posts. ( #14546 )" ( #14692 )
2021-03-16 10:38:45 -07:00
Tao Wang
897b84b300
[large scale]Add option for disable/enable context connection and disable asynchro… ( #14596 )
2021-03-16 15:09:13 +08:00
Tao Wang
c572563e1e
[large scale]Add enable sharding option and disable sharding for gcs client ( #14600 )
2021-03-15 19:35:00 +08:00
Siyuan (Ryans) Zhuang
b92531918e
Make use of C++14 'make_unique' ( #14663 )
2021-03-15 03:00:52 -07:00
Tao Wang
3402b1752f
[GCS]Report job error to gcs instead of direct publishing ( #14617 )
...
* [GCS]Report job error to gcs instead of direct publishing
* fix compile
2021-03-12 14:54:08 -08:00
Eric Liang
2ba49c2701
Distinguish between grpc client and server events in asio metrics ( #14637 )
2021-03-12 11:13:59 -08:00
Clark Zinzow
7b3102dd32
Add resource report lag warning. ( #14611 )
2021-03-11 17:29:45 -08:00
Yi Cheng
ad8e35b919
[ray] Update cpp to std14 ( #14441 )
2021-03-10 14:05:52 -08:00
Clark Zinzow
566dcea56a
[Core] Added event loop metrics for posts. ( #14546 )
...
* Added event loop metrics for posts.
* io_context_proxy --> instrumented_io_context
* Fix feature flag, chrono-->absl, trim the stats, inline functions, reformat stats string.
* Make stats struct mutex plain lock instead of reader-writer lock.
* Mutex reader locking, std::array double braces initialization.
* Fix Bazel BUILD formatting.
2021-03-10 11:52:45 -08:00
Stephanie Wang
0f3530da3b
[core] Only consider actual workers when killing idle workers ( #14578 )
2021-03-10 09:30:19 -08:00
Alex Wu
e1fbb8489e
[core] Supress infeasible warning ( #14068 )
2021-03-09 16:37:56 -08:00
Yi Cheng
ed8935406b
[core] Minimal support for runtime env ( #14270 )
2021-03-09 11:53:58 -08:00
Alex Wu
ba6cebe30f
Raylet request resource report endpoint ( #14291 )
...
* .
* done?
* raylet side done?
* .
* .
* .
* client
* .
* fix tests
* make ci happy
* lint
* cleanup
* clang sucks
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-03-09 09:50:50 -08:00
Eric Liang
3fab5e2ada
Switch memory units to bytes ( #14433 )
2021-03-06 19:32:35 -08:00
Alex Wu
2395e25fc0
[hotfix][core] Load balancing spillback feature flag ( #14457 )
2021-03-05 16:45:33 -08:00
DK.Pino
26907b7708
Support placement group for normal task in Java API ( #14342 )
...
* support pg for normal task
* fix lint
* fix comment
* fix comment
* update comment
* fix java typo
2021-03-05 10:21:37 +08:00
SangBin Cho
190ab40645
[Core] Display ip address when node dies ( #14489 )
...
* done.
* Addressed code review.
2021-03-04 10:27:00 -08:00
Kai Yang
1d7bd990b6
[Java] Update System.gc() log to debug level ( #14490 )
2021-03-04 18:54:10 +08:00
Kai Yang
5d79821e69
[Core] Initialize system config in CoreWorkerProcess constructor ( #14439 )
2021-03-04 16:34:54 +08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends ( #14184 )
2021-03-03 18:29:15 -08:00
ZhuSenlin
dcff25aed6
remove invalid code inside NodeManager::NodeAdded ( #14273 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antgroup.com>
2021-03-03 09:20:21 -08:00
Kai Yang
c53c909130
[Java] Quit worker process after RunTaskExecutionLoop to avoid orphan Java worker processses ( #14442 )
2021-03-03 16:47:17 +08:00
fangfengbin
1054613da1
[Core]Fix ray.kill doesn't cancel pending actor bug ( #14154 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-03-03 16:12:32 +08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue ( #14288 )
...
* Spill back waiting tasks
* test
* test
* todo
* Avoid iterating over args
* update
* lint
* Fix test
* test
* Test force spillback
* Unit test resource scheduler
* test
* travis?
* rename
* debug
* revert flaky test
* lint
* fix test
* fix
2021-03-02 22:30:02 -08:00
SangBin Cho
bacbdd297b
[Core] Do not unregister workers that own objects by worker capping mechanism. ( #14408 )
...
* Almost done.
* Initial implementation done.
* Fix issue.
* Addressed the initial code review.
* improve comments.
* Addressed code review.
* Adding unit tests.
* Complete unit tests.
* Resolve all issues.
* Fix issues.
2021-03-02 12:24:22 -08:00
Yi Cheng
d921dca075
[core] Fixing bug when dispatching tasks to deleted placement group ( #14300 )
2021-03-02 10:24:53 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions ( #14358 )
...
* Randomize actor ID
* Mix index and current time, add python test
* test
* nanos
2021-03-02 10:00:28 -08:00
Tao Wang
2de01ee3b1
[GCS]Cherry pick heartbeat function into another thread ( #14301 )
2021-03-02 17:49:02 +08:00
SangBin Cho
09fd38ede1
[Multi node shuffle] More efficient ray memory --stats-only ( #14423 )
...
* Done.
* Fix all the issues.
2021-03-01 23:14:06 -08:00
SangBin Cho
0ec8efbb47
[Core] Minor fixes ( #14411 )
...
* Fix issue.
* Lint.
* Addressed code review.
2021-03-01 18:37:05 -08:00
Eric Liang
9db000ff2c
Auto report object store memory usage; remove some deprecated code ( #14260 )
2021-03-01 13:19:44 -08:00
Qing Wang
f7f64e90ed
[Minor] Remove unused field. ( #14382 )
...
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-03-01 19:35:28 +08:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port ( #14349 )
2021-02-27 20:27:09 +08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args ( #14329 )
2021-02-26 12:42:58 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. ( #14296 )
...
* Adds locality-aware leasing for borrowed refs.
* Added tests.
2021-02-26 10:36:12 -08:00
Richard Liaw
3e9ff91218
Revert the reverted heartbeat factor PR (check windows build) ( #14341 )
2021-02-25 20:52:12 -08:00
Eric Liang
adbdacae58
add more io workers ( #14330 )
2021-02-24 22:00:31 -08:00
Clark Zinzow
c1a1be1da6
[Core] Locality-aware leasing: Milestone 2 - Owned refs, cached locations ( #14282 )
...
* Adds locality-aware leasing for cached owned refs.
* Add tests for locality-aware leasing on cached owned refs.
2021-02-24 21:24:10 -08:00
Richard Liaw
80657e5dfe
Revert "[Core]Pull off timers out of heartbeat in raylet ( #13963 )" ( #14319 )
2021-02-24 19:44:31 -08:00
ZhuSenlin
be28e8fae4
use iterator to instead of operator[] to avoid garbage ( #14275 )
2021-02-25 11:37:36 +08:00
fangfengbin
482a00278b
[GCS]Fix flaky testcase: ServiceBasedGcsClientTest ( #14248 )
2021-02-24 20:35:30 +08:00
Tao Wang
6af0291347
[Core]Pull off timers out of heartbeat in raylet ( #13963 )
2021-02-24 11:59:13 +08:00
SangBin Cho
b7c56b8a71
[Core] Improve the server startup error message. ( #14267 )
...
* Improve the error message further.
* fix comment.
* Fix comment 2.
* improve messages to be even more high level.
* Address code review.
2021-02-23 16:26:06 -08:00
DK.Pino
911b028c54
[Placement Group] Make the creation of placement group sync ( #13858 )
...
* make pg creation sync
* return successful immediately when pg registeration
* hold on
* fix ut
* make collection for callback
* make pg registration vector
* fix new cpp ut
* fix named py ut
* fix python ut bug
* fix python ut
* fix lint
* modify comment
* fix comment
* fix comment
* add new ut and fix old lint issue
* fix comment
* update comment
* fix conflict
2021-02-23 16:11:43 -08:00
Clark Zinzow
d344e77109
Revert "Revert "Inline small objects in GetObjectStatus response. ( #13309 )" ( #13615 )" ( #13618 )
...
This reverts commit 20acc3b05e
.
2021-02-23 12:06:37 -08:00
Simon Mo
dfd5eb4b0d
[Core] fix gcs use-after-free from ASAN ( #14199 )
2021-02-23 10:37:31 -08:00
ZhuSenlin
8be107196d
fix retry leasing worker ( #14272 )
2021-02-23 19:38:40 +08:00
Clark Zinzow
5ce9b93f47
[Core] Ownership-based Object Directory - Enabled by default ( #14254 )
2021-02-22 22:09:41 -08:00
Alex Wu
79653049d2
[core] Start less worker processes ( #14202 )
2021-02-22 22:01:38 -08:00
ZhuSenlin
8e0b2d07f4
[Core] synchronize job config to worker when it registers to raylet ( #13402 )
2021-02-23 11:48:54 +08:00