Commit graph

1930 commits

Author SHA1 Message Date
Hao Chen
10ff2f3b4a
Fix duplicate destruction of CoreWorkerProcess instance (#15245) 2021-04-12 21:01:21 +08:00
chenk008
6709560ef6
fix setproctitle break /proc/PID/environ (#15056)
* fix setproctitle break /proc/PID/environ

* bugfix

* add ut

* fix ut

* fix ut

* fix ut

* improve comment

* improve comment

* fix ut lint

* fix ut lint

* revert init.py

Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-04-09 15:45:19 -07:00
Stephanie Wang
94e592004e
Prioritize worker requests for objects over queued task arguments (#15157) 2021-04-08 14:51:21 -07:00
SangBin Cho
a9ac4ad890
Revert "[GCS]Increase heartbeat interval to reduce pressure on gcs server (#14203)" (#15194)
This reverts commit ef195e5108.
2021-04-08 09:29:13 -07:00
SangBin Cho
bd58a9a9ff
[Build] Fix symbol problems (#15187) 2021-04-08 09:11:15 -07:00
Alex Wu
e5feaee95a
[core worker] Disable async connections (#15161) 2021-04-07 22:32:04 -07:00
SangBin Cho
61d120557d
[Pubsub] Generalize pubsub, Move pubsub code to pubsub_lib module (#15164)
* cherry-pick-1

* cherry-pick-2

* cherry-pick-part-3

* Should work.

* Lint fix.

* Fix lint 2.
2021-04-07 20:40:39 -07:00
Tao Wang
ef195e5108
[GCS]Increase heartbeat interval to reduce pressure on gcs server (#14203) 2021-04-08 11:14:43 +08:00
SangBin Cho
e0872083b8
[Pubsub] Generalize pubsub impl part 1 (#15116)
* Finished the implementation. Cpp tests are left.

* Fix cpp tests.

* Addressed code review.

* Addressed code review.

* Change the destruction order.

* Addressed code review part 2.
2021-04-06 20:59:32 -07:00
Alex Wu
10fdb9e9ac
[metrics] Scheduler metrics (#14716) 2021-04-06 11:27:54 -07:00
Siyuan (Ryans) Zhuang
64cc092959
[Core] Cleanup C++ code (#15109)
* cleanup c++ code

* more cleanup

* lint

* lint
2021-04-06 03:29:03 -07:00
Yi Cheng
afc92130fa
Unpack runtime env to runtime_resource (#15111) 2021-04-05 17:35:31 -07:00
Siyuan (Ryans) Zhuang
6f56d7e360
Fix compilation warnings (#15104) 2021-04-05 16:14:32 -07:00
Yi Cheng
5806e726f4
[core] Internal kv support in gcs (#14656)
* server side ready

* client size

* py

* fix

* up

* format

* add files

* add pyx

* up

* up

* up

* add keys

* format

* update

* format

* add unittests

* add files

* up

* up

* fix

* up

* fix thread issue

* format

* fix

* Fix

* format

* fix

* more

* fix conflict

* fix

* fix order

* format

* compiling fix

* lint

* fix

* fix some

* some fix

* fix comment

* fix name

* format

* fix compatible issue

* fix name

* fix lint

* disconnect safe

* up

* format

* fix

Co-authored-by: Yi Cheng <singye888@gmail.com>
2021-04-05 10:26:46 -07:00
Yi Cheng
672dad8056
Fix gcs test failure (#15098) 2021-04-04 14:53:04 +08:00
Siyuan (Ryans) Zhuang
7fd86f7e15
[Core] Use static callback instead of dynamic notification listener (#15059)
* static callback & remove outdated protocol

* address comments

* fix

* make fields constant

* fix windows compilation error
2021-04-02 22:33:41 -07:00
SangBin Cho
cef6286f63
[Pubsub] Batch messages (#15084)
* batch pubsub 1

* Logic done. Tests left.

* done.
2021-04-02 16:42:18 -07:00
SangBin Cho
015369db34
[Core] Fix plasma store segfault (#15071)
* Use shared pointer instead of a raw pointer

* Lint.

* Addressed code review.

* Addressed code review.g
2021-04-02 14:54:20 -07:00
Yi Cheng
ecb94b3fe9
Add test case to check job conf compatible issue (#15082) 2021-04-02 12:03:21 -07:00
SangBin Cho
3578d4e9d8
[Object Spilling] Limit number of objects to fuse (#15034)
* ready to go.

* Done.

* done.

* Done.

* Addressed code review.

* Fix a build issue.
2021-04-02 10:49:15 -07:00
SangBin Cho
3965310f93
[Core] Fix the check failure from object manager (#15070) 2021-04-01 21:21:42 -07:00
Alex Wu
f52c855704
[core] Fix placement group GPU assignment bug (#15049) 2021-04-01 17:46:09 -07:00
Yi Cheng
d4c20c970b
[core] Fix UTIL worker issue (#14925)
* Fix

* format

* more

* format

* fix

* fix

* fix comment

* fix test failure
2021-04-01 17:36:45 -07:00
Siyuan (Ryans) Zhuang
6ad379864e
[doc] Fix inconsistent doc about ObjectID bytes (#15072) 2021-04-01 17:14:30 -07:00
Alex Wu
4fba05ae4d
[core] Hybrid scheduling policy. (#14790) 2021-04-01 16:59:59 -07:00
fangfengbin
18728b2b7e
Fix c++ gcs test bug (#15063)
* fix ut bug

* fix bug

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-04-01 09:19:24 -07:00
SangBin Cho
005cff0092
Revert "Revert "[Core] Implement long polling-based pubsub to reduce … (#14909) 2021-04-01 09:03:15 -07:00
Hao Chen
3e1a0439b7
Fix concurrent actor starting too many threads. (#14927) 2021-04-01 19:58:18 +08:00
Stephanie Wang
a86a7a6a98
[core] Cap total memory used by executing tasks' arguments (#15027)
* Task dependency map

* Pinned args threshold

* Unit test and fix

* no leaks

* update

* update

* remove assertion
2021-03-31 15:38:40 -07:00
SangBin Cho
79a6aa97b7
[Core] Optimize get core worker Stats (#15008)
* in progress.

* Optimize get core worker stats.

* Fix a segfault.

* Addressed code review.

* Update comments.

* Addressed code review.
2021-03-31 12:21:53 -07:00
Yi Cheng
4480132229
[core] Integration runtime_env with ray client (#14881)
* server side ready

* client size

* py

* fix

* up

* format

* add files

* add pyx

* up

* up

* up

* add keys

* format

* update

* format

* add unittests

* add files

* up

* up

* fix

* up

* fix thread issue

* format

* fix

* update proto

* Fix

* format

* fix

* more

* fix conflict

* fix

* fix order

* format

* add

* up

* compiling fix

* lint

* fix

* format

* fix some

* some fix

* fix comment

* test cases

* add test

* comments

* fix name

* format

* fix

* revert gcs-kv

* fix comments

* fix failure

* fix test

* format

* fix timeout

* fix

* fix

* fix

* format

* format

* fix flaky test

Co-authored-by: Yi Cheng <singye888@gmail.com>
2021-03-31 11:39:34 -07:00
Kai Yang
6278df8604
[Java] refine generation of jvm options (#14931) 2021-03-31 21:04:52 +08:00
Siyuan (Ryans) Zhuang
3aa39142db
[Core] Remove code paths that run plasma store as a process (#14924)
* enable plasma store as thread by default

remove unused code path that runs plasma store as a process
2021-03-30 16:19:03 -07:00
SangBin Cho
4edcaa8870
[Stats] Basic implementation for the the periodic asio stats printing support. (#14982)
* Basic implementation for the the periodic asio stats printing support.

* hacky way to count grpc stats.

* lint

* Fix an issue.

* Revert the request/reply.
2021-03-29 21:51:16 -07:00
Alex Wu
1f4d4dfeb0
Gcs pull resource reports (#14336) 2021-03-29 11:36:30 -07:00
Siyuan (Ryans) Zhuang
87c79553e9
[Core] Remove code paths that contains plasma store executable (#14950)
* remove plasma store executable & never used tests

* set default behavior

* fix tests
2021-03-28 21:22:14 -07:00
qicosmos
de7ee75d27
[C++ worker] Ray normal task for RAY_REMOTE (#14599) 2021-03-27 09:56:40 +08:00
SangBin Cho
839cd1e0a2
[Core] Remove unnecessary redis connection (#14511)
* remove unnecessary stuff.

* test in progress.

* Fix tests.

* lint

* fix.

* Remove tests that were not working properly before.
2021-03-26 10:29:12 -07:00
Eric Liang
2157021fd3
Refactor object restoration path (#14821) 2021-03-25 22:46:50 -07:00
Yi Cheng
f427801c10
Revert "[core] Fix worker type in python (#14823)" (#14910)
This reverts commit 9ccf291f4d.
2021-03-24 13:27:56 -07:00
SangBin Cho
ec3cfef883
Revert "[Core] Implement long polling-based pubsub to reduce number of WaitForObjectEviction requests in flight. (#14638)" (#14905)
This reverts commit 35ec91c4e0.
2021-03-24 11:22:48 -07:00
Clark Zinzow
ed46d8bf45
[Core] Added ownership-based object directory metrics, fixed raylet metric bug. (#14855)
* Added ownership-based object directory metrics.

* Updated OBOD metric descriptions.

* Dump OBOD metrics in debug string.

* Added e2e tests for metrics.
2021-03-24 10:53:22 -07:00
SangBin Cho
35ec91c4e0
[Core] Implement long polling-based pubsub to reduce number of WaitForObjectEviction requests in flight. (#14638)
* in progress.

* IN progress.

* lint.

* Updated code

* lint.

* In progress of writing tets.

* Finished implementation. Need cleanup & refactoring.

* fixing tests...

* Finish the impl.

* Fix typo.

* impl done. Only cleanup left.

* done.

* Finished clean up.

* Fix issues.

* Add a stronger consistency check.

* Addressed code review.

* lint.

* done.

* Addressed more.

* addressed all reviews.

* Addressed code review.

* lint.

* Added unit tests to assert no leak.
2021-03-23 23:47:08 -07:00
Stephanie Wang
201ebc3f92
Revert "[core] Set a configurable max memory for fetched objects (#14817)" (#14887)
This reverts commit 8769953474.
2021-03-23 21:58:11 -07:00
fyrestone
52cfa1cdd7
Fix load code from local (#12102) 2021-03-24 11:49:58 +08:00
Stephanie Wang
8769953474
[core] Set a configurable max memory for fetched objects (#14817)
* Set threshold, tests

* comment

* move max to pull manager

* unit test

* fix plasma

* comment
2021-03-23 13:55:02 -07:00
Yi Cheng
9ccf291f4d
[core] Fix worker type in python (#14823)
* Fix

* format

* more

* format
2021-03-23 00:58:57 -07:00
SangBin Cho
87877cdfbf
[Test] Fix flaky object spilling test (#14722)
* start

* done.

* d

* d

* Push the fix.

* done.

* Enable test.
2021-03-22 12:51:47 -07:00
Yi Cheng
881a46e1d6
[core] RuntimeEnv GC in local node (#14594) 2021-03-18 14:55:11 -07:00
Tao Wang
5305dbb639
[large scale]Always disable sync/subscribe context in sharding context (#14706) 2021-03-18 19:31:36 +08:00