Commit graph

1930 commits

Author SHA1 Message Date
Tao Wang
44a7ce3d35
[large scale]Disable async/subscribe context in global state accessor (#14705) 2021-03-18 11:07:33 +08:00
Tao Wang
ea7c9171e9
[large scale]Disable async context in raylets' gcs client (#14704) 2021-03-18 10:50:09 +08:00
Clark Zinzow
6a28cf4add
[Core] Event loop instrumentation concurrency fixes. (#14719)
* Moved global stats member to a shared pointer explicitly captured by-value by handler lambdas, fixed handler stats copy outside of lock, ported to generalized lambda capture.

* Reenabled event loop instrumentation by default.

* Remove explicit inline specifier from non-member functions, move into anonymous namespace.

* Revert "Reenabled event loop instrumentation by default."

This reverts commit 949215269f79a1ab5ddc1ce0285c3ff4477ee6e0.
2021-03-17 16:49:25 -07:00
Lixin Wei
72d87093b9
[Core] Make Actor DEAD and Save Exceptions in GCS When Error Happens in Constructor (#14211) 2021-03-17 12:50:28 -07:00
Ian Rodney
bd641a5e71
Revert "[Core] Added event loop metrics for posts. (#14546)" (#14692) 2021-03-16 10:38:45 -07:00
Tao Wang
897b84b300
[large scale]Add option for disable/enable context connection and disable asynchro… (#14596) 2021-03-16 15:09:13 +08:00
Tao Wang
c572563e1e
[large scale]Add enable sharding option and disable sharding for gcs client (#14600) 2021-03-15 19:35:00 +08:00
Siyuan (Ryans) Zhuang
b92531918e
Make use of C++14 'make_unique' (#14663) 2021-03-15 03:00:52 -07:00
Tao Wang
3402b1752f
[GCS]Report job error to gcs instead of direct publishing (#14617)
* [GCS]Report job error to gcs instead of direct publishing

* fix compile
2021-03-12 14:54:08 -08:00
Eric Liang
2ba49c2701
Distinguish between grpc client and server events in asio metrics (#14637) 2021-03-12 11:13:59 -08:00
Clark Zinzow
7b3102dd32
Add resource report lag warning. (#14611) 2021-03-11 17:29:45 -08:00
Yi Cheng
ad8e35b919
[ray] Update cpp to std14 (#14441) 2021-03-10 14:05:52 -08:00
Clark Zinzow
566dcea56a
[Core] Added event loop metrics for posts. (#14546)
* Added event loop metrics for posts.

* io_context_proxy --> instrumented_io_context

* Fix feature flag, chrono-->absl, trim the stats, inline functions, reformat stats string.

* Make stats struct mutex plain lock instead of reader-writer lock.

* Mutex reader locking, std::array double braces initialization.

* Fix Bazel BUILD formatting.
2021-03-10 11:52:45 -08:00
Stephanie Wang
0f3530da3b
[core] Only consider actual workers when killing idle workers (#14578) 2021-03-10 09:30:19 -08:00
Alex Wu
e1fbb8489e
[core] Supress infeasible warning (#14068) 2021-03-09 16:37:56 -08:00
Yi Cheng
ed8935406b
[core] Minimal support for runtime env (#14270) 2021-03-09 11:53:58 -08:00
Alex Wu
ba6cebe30f
Raylet request resource report endpoint (#14291)
* .

* done?

* raylet side done?

* .

* .

* .

* client

* .

* fix tests

* make ci happy

* lint

* cleanup

* clang sucks

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-03-09 09:50:50 -08:00
Eric Liang
3fab5e2ada
Switch memory units to bytes (#14433) 2021-03-06 19:32:35 -08:00
Alex Wu
2395e25fc0
[hotfix][core] Load balancing spillback feature flag (#14457) 2021-03-05 16:45:33 -08:00
DK.Pino
26907b7708
Support placement group for normal task in Java API (#14342)
* support pg for normal task

* fix lint

* fix comment

* fix comment

* update comment

* fix java typo
2021-03-05 10:21:37 +08:00
SangBin Cho
190ab40645
[Core] Display ip address when node dies (#14489)
* done.

* Addressed code review.
2021-03-04 10:27:00 -08:00
Kai Yang
1d7bd990b6
[Java] Update System.gc() log to debug level (#14490) 2021-03-04 18:54:10 +08:00
Kai Yang
5d79821e69
[Core] Initialize system config in CoreWorkerProcess constructor (#14439) 2021-03-04 16:34:54 +08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends (#14184) 2021-03-03 18:29:15 -08:00
ZhuSenlin
dcff25aed6
remove invalid code inside NodeManager::NodeAdded (#14273)
Co-authored-by: senlin.zsl <senlin.zsl@antgroup.com>
2021-03-03 09:20:21 -08:00
Kai Yang
c53c909130
[Java] Quit worker process after RunTaskExecutionLoop to avoid orphan Java worker processses (#14442) 2021-03-03 16:47:17 +08:00
fangfengbin
1054613da1
[Core]Fix ray.kill doesn't cancel pending actor bug (#14154)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-03-03 16:12:32 +08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue (#14288)
* Spill back waiting tasks

* test

* test

* todo

* Avoid iterating over args

* update

* lint

* Fix test

* test

* Test force spillback

* Unit test resource scheduler

* test

* travis?

* rename

* debug

* revert flaky test

* lint

* fix test

* fix
2021-03-02 22:30:02 -08:00
SangBin Cho
bacbdd297b
[Core] Do not unregister workers that own objects by worker capping mechanism. (#14408)
* Almost done.

* Initial implementation done.

* Fix issue.

* Addressed the initial code review.

* improve comments.

* Addressed code review.

* Adding unit tests.

* Complete unit tests.

* Resolve all issues.

* Fix issues.
2021-03-02 12:24:22 -08:00
Yi Cheng
d921dca075
[core] Fixing bug when dispatching tasks to deleted placement group (#14300) 2021-03-02 10:24:53 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions (#14358)
* Randomize actor ID

* Mix index and current time, add python test

* test

* nanos
2021-03-02 10:00:28 -08:00
Tao Wang
2de01ee3b1
[GCS]Cherry pick heartbeat function into another thread (#14301) 2021-03-02 17:49:02 +08:00
SangBin Cho
09fd38ede1
[Multi node shuffle] More efficient ray memory --stats-only (#14423)
* Done.

* Fix all the issues.
2021-03-01 23:14:06 -08:00
SangBin Cho
0ec8efbb47
[Core] Minor fixes (#14411)
* Fix issue.

* Lint.

* Addressed code review.
2021-03-01 18:37:05 -08:00
Eric Liang
9db000ff2c
Auto report object store memory usage; remove some deprecated code (#14260) 2021-03-01 13:19:44 -08:00
Qing Wang
f7f64e90ed
[Minor] Remove unused field. (#14382)
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-03-01 19:35:28 +08:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port (#14349) 2021-02-27 20:27:09 +08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args (#14329) 2021-02-26 12:42:58 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. (#14296)
* Adds locality-aware leasing for borrowed refs.

* Added tests.
2021-02-26 10:36:12 -08:00
Richard Liaw
3e9ff91218
Revert the reverted heartbeat factor PR (check windows build) (#14341) 2021-02-25 20:52:12 -08:00
Eric Liang
adbdacae58
add more io workers (#14330) 2021-02-24 22:00:31 -08:00
Clark Zinzow
c1a1be1da6
[Core] Locality-aware leasing: Milestone 2 - Owned refs, cached locations (#14282)
* Adds locality-aware leasing for cached owned refs.

* Add tests for locality-aware leasing on cached owned refs.
2021-02-24 21:24:10 -08:00
Richard Liaw
80657e5dfe
Revert "[Core]Pull off timers out of heartbeat in raylet (#13963)" (#14319) 2021-02-24 19:44:31 -08:00
ZhuSenlin
be28e8fae4
use iterator to instead of operator[] to avoid garbage (#14275) 2021-02-25 11:37:36 +08:00
fangfengbin
482a00278b
[GCS]Fix flaky testcase: ServiceBasedGcsClientTest (#14248) 2021-02-24 20:35:30 +08:00
Tao Wang
6af0291347
[Core]Pull off timers out of heartbeat in raylet (#13963) 2021-02-24 11:59:13 +08:00
SangBin Cho
b7c56b8a71
[Core] Improve the server startup error message. (#14267)
* Improve the error message further.

* fix comment.

* Fix comment 2.

* improve messages to be even more high level.

* Address code review.
2021-02-23 16:26:06 -08:00
DK.Pino
911b028c54
[Placement Group] Make the creation of placement group sync (#13858)
* make pg creation sync

* return successful immediately when pg registeration

* hold on

* fix ut

* make collection for callback

* make pg registration vector

* fix new cpp ut

* fix named py ut

* fix python ut bug

* fix python ut

* fix lint

* modify comment

* fix comment

* fix comment

* add new ut and fix old lint issue

* fix comment

* update comment

* fix conflict
2021-02-23 16:11:43 -08:00
Clark Zinzow
d344e77109
Revert "Revert "Inline small objects in GetObjectStatus response. (#13309)" (#13615)" (#13618)
This reverts commit 20acc3b05e.
2021-02-23 12:06:37 -08:00
Simon Mo
dfd5eb4b0d
[Core] fix gcs use-after-free from ASAN (#14199) 2021-02-23 10:37:31 -08:00