Commit graph

1876 commits

Author SHA1 Message Date
Siyuan (Ryans) Zhuang
ed77c8b16c
[Core] Use global variable to eliminate force thread termination in plasma (#8912)
* use global variable to eliminate force thread termination
2020-06-12 14:20:53 -07:00
Siyuan (Ryans) Zhuang
4b31b383f3
[Core] Run Plasma Store as a Raylet thread (with a feature flag) (#8897)
* integrate plasma store as a thread (C++)

* integrate plasma store as a thread (Python)

* fix config issues

* remove plasma component fail tests

* without forcefully kill the plasma store thread
2020-06-11 22:54:08 -07:00
mehrdadn
cae475c46a
Fix Windows build (#8905)
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-11 14:54:37 -07:00
Stephanie Wang
05010caed2
[core] Fix race condition for object reconstruction (#8791)
* Fix

* doc

* Unit test

* Update src/ray/core_worker/task_manager.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/task_manager.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/task_manager.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* lint

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-06-10 19:49:12 -07:00
Ian Rodney
2cf3d8c92c
[core] Check that port is unused before assigning to worker (#8773) 2020-06-10 18:35:38 -05:00
fangfengbin
a5bebd4408
Fix create actor rpc reconnect bug (#8855) 2020-06-10 10:53:53 +08:00
Siyuan (Ryans) Zhuang
3d473600a8
[Core] Use Ray ObjectID in Plasma (#8852)
* Use Ray ObjectIDs instead

* remove unused code
2020-06-09 10:10:49 -07:00
chaokunyang
31a4d07bc4
[Java] Rename java ObjectRef/ActorHandle (#8799) 2020-06-09 11:40:43 +08:00
Siyuan (Ryans) Zhuang
c1e6813cea
[core] Move plasma store under object_manager (#8832)
* move plasma under object directory

* update include paths

* cleanup

* disable lint of third-party libraries

* lint
2020-06-08 18:21:41 -07:00
SangBin Cho
3388864768
[Core] Clean up detached actors (#8759) 2020-06-08 11:22:01 -05:00
fangfengbin
68718b33b4
GCS Server add SIGTERM signal handler (#8795) 2020-06-08 17:26:36 +08:00
mehrdadn
3ee2e9f7e5
Make #include consistent (#8666)
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-07 15:43:24 +02:00
mehrdadn
f68183d778
Error-checking for a couple of corruption issues (#8059)
* Extra error handling
* Handle connection closed in Redis monitor
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-07 15:43:00 +02:00
Siyuan (Ryans) Zhuang
a0247ffe55
Build plasma store as a library (#8817)
* build plasma store as a library

* remove unused headers

* windows support
2020-06-06 22:11:37 -07:00
Stephanie Wang
b160b83d3e
[core] Queue subscription/unsubscription commands in the GCS (#8756)
* Only remove callback index if in map

* test

* Queue subscription commands

* lint

* Check status

* update

* update

* update

* Disable GCS restart tests

* lint
2020-06-05 19:49:19 -07:00
mehrdadn
d78757623d
bazel build --compilation_mode=debug (#6457) 2020-06-05 14:36:10 +02:00
Tao Wang
41072fbcc8
Implement GetByJobId in gcs table storage (#8727) 2020-06-04 20:51:43 +08:00
fangfengbin
84a8f2ccb5
Support reloading storage data when gcs server restarts (#8650) 2020-06-04 14:53:20 +08:00
Siyuan (Ryans) Zhuang
ea05ebe89e
Ship plasma store with Ray (#7901) 2020-06-03 17:44:34 -07:00
Stephanie Wang
aa06c3b15a
Eager eviction even when object pinning is disabled (#8561)
* Eager eviction even when object pinning is disabled, add regression test

* Make test more robust

* lint
2020-06-02 11:48:03 -07:00
Lingxuan Zuo
64a98e4447
Fix sum aggregator in its metric (#8724) 2020-06-02 17:36:25 +08:00
Lingxuan Zuo
4cbbc15ca7
[GCS] Global state accessor from node resource table (#8658) 2020-06-02 14:01:00 +08:00
acxz
8b924a4846
[gcs] add missing templated log classes (#8690)
Resolves #8535
2020-06-01 13:39:59 -07:00
Tao Wang
1df408d6ed
Resubscribe object table info when gcs service restart (#8639) 2020-06-01 10:42:26 +08:00
fangfengbin
016337d4eb
Heartbeat table uses gcs pub-sub instead of redis accessor (#8655) 2020-05-30 23:17:25 +08:00
fangfengbin
10c87063be
merge actor info handler into actor manager (#8682) 2020-05-30 21:56:29 +08:00
Edward Oakes
4955d14878
Remove transport type remnants (#8673) 2020-05-29 15:47:08 -05:00
mehrdadn
cb91fe2fc4
SetErrorMode for all Ray processes (#8656) 2020-05-29 10:18:20 -05:00
fangfengbin
35eeec5647
Add C++ global state for actor table (#8501)
* add global state actors

* fix code style

* fix GcsActorManagerTest bug

* rebase master

* add jni code

* add get checkpoint id code

* add debug code

* add debug code

* change log level

* fix compile bug

* return null in jni

* fix crash bug

* change import seq

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-05-29 21:10:42 +08:00
Hao Chen
08fee00bc8
Increase rayelt client connect timeout to fix test_debug_tools (#8605) 2020-05-28 20:57:30 +08:00
Lingxuan Zuo
e594524ed3
[GCS] global state query node info table from GCS. (#8498) 2020-05-28 16:39:13 +08:00
Tao Wang
675ccbc799
Resubscribe worker table info when gcs service restart (#8606) 2020-05-28 10:27:38 +08:00
Edward Oakes
442ada0fcd
Remove shutdown prints to the console (#8626) 2020-05-27 10:52:31 -05:00
Lingxuan Zuo
bd4fbcd7fc
Global state accessor jni (#8637) 2020-05-27 17:43:47 +08:00
Tao Wang
a1298686d7
[TEST]Use manager class to start/stop components instead of spreading duplicated codes everywhere (#8500) 2020-05-27 16:51:51 +08:00
fangfengbin
b0cf781152
fix resubscribe miss callback index bug (#8604) 2020-05-27 11:55:17 +08:00
fangfengbin
99dd6a581d
fix testActorRestart failure bug (#8613) 2020-05-27 11:10:45 +08:00
fangfengbin
01f4a6eca0
Add task table subscribe retry when gcs service restart (#8601) 2020-05-26 17:47:03 +08:00
fangfengbin
c41976938d
Add node table subscribe retry when gcs service restart (#8591) 2020-05-26 14:42:48 +08:00
Tao Wang
7e5b3dc0d9
GCS server task info handler use storage instead of redis accessor (#8584) 2020-05-26 10:38:31 +08:00
fangfengbin
765d470c40
Add gcs object manager (#8298) 2020-05-25 17:21:35 +08:00
fangfengbin
f22d12d2fc
fix TestGetUncommittedLineage npe bug (#8585) 2020-05-25 15:48:58 +08:00
fangfengbin
229af662c6
Add job table&actor table subscribe retry when gcs service restart (#8442) 2020-05-25 14:38:25 +08:00
Tao Wang
92c2e41dfd
[GCS]profile info getting implementation based gcs service (#8536) 2020-05-24 22:23:01 +08:00
fangfengbin
2ab1b773d4
GCS server worker info handler use storage instead of redis accessor (#8543) 2020-05-23 23:17:36 +08:00
Eric Liang
351839bf69
Revert "GCS server task info handler use storage instead of redis accessor (#8531)" (#8562)
This reverts commit 9823e15311.
2020-05-22 19:16:43 -07:00
Kai Yang
2e5e789294
Allow enabling logging in core worker with empty log_dir (#8529) 2020-05-22 18:02:37 +08:00
fangfengbin
9823e15311
GCS server task info handler use storage instead of redis accessor (#8531) 2020-05-22 12:04:03 +08:00
Eric Liang
bb8d3c5cd0
ASAN build for ray core tests (#8431) 2020-05-21 15:11:03 -07:00
Edward Oakes
a76434ccde
Add ability to specify worker and driver ports (#8071) 2020-05-20 15:31:13 -05:00