mehrdadn
3ee2e9f7e5
Make #include consistent ( #8666 )
...
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-07 15:43:24 +02:00
mehrdadn
f68183d778
Error-checking for a couple of corruption issues ( #8059 )
...
* Extra error handling
* Handle connection closed in Redis monitor
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-07 15:43:00 +02:00
Siyuan (Ryans) Zhuang
a0247ffe55
Build plasma store as a library ( #8817 )
...
* build plasma store as a library
* remove unused headers
* windows support
2020-06-06 22:11:37 -07:00
Stephanie Wang
b160b83d3e
[core] Queue subscription/unsubscription commands in the GCS ( #8756 )
...
* Only remove callback index if in map
* test
* Queue subscription commands
* lint
* Check status
* update
* update
* update
* Disable GCS restart tests
* lint
2020-06-05 19:49:19 -07:00
mehrdadn
d78757623d
bazel build --compilation_mode=debug ( #6457 )
2020-06-05 14:36:10 +02:00
Tao Wang
41072fbcc8
Implement GetByJobId in gcs table storage ( #8727 )
2020-06-04 20:51:43 +08:00
fangfengbin
84a8f2ccb5
Support reloading storage data when gcs server restarts ( #8650 )
2020-06-04 14:53:20 +08:00
Siyuan (Ryans) Zhuang
ea05ebe89e
Ship plasma store with Ray ( #7901 )
2020-06-03 17:44:34 -07:00
Stephanie Wang
aa06c3b15a
Eager eviction even when object pinning is disabled ( #8561 )
...
* Eager eviction even when object pinning is disabled, add regression test
* Make test more robust
* lint
2020-06-02 11:48:03 -07:00
Lingxuan Zuo
64a98e4447
Fix sum aggregator in its metric ( #8724 )
2020-06-02 17:36:25 +08:00
Lingxuan Zuo
4cbbc15ca7
[GCS] Global state accessor from node resource table ( #8658 )
2020-06-02 14:01:00 +08:00
acxz
8b924a4846
[gcs] add missing templated log classes ( #8690 )
...
Resolves #8535
2020-06-01 13:39:59 -07:00
Tao Wang
1df408d6ed
Resubscribe object table info when gcs service restart ( #8639 )
2020-06-01 10:42:26 +08:00
fangfengbin
016337d4eb
Heartbeat table uses gcs pub-sub instead of redis accessor ( #8655 )
2020-05-30 23:17:25 +08:00
fangfengbin
10c87063be
merge actor info handler into actor manager ( #8682 )
2020-05-30 21:56:29 +08:00
Edward Oakes
4955d14878
Remove transport type remnants ( #8673 )
2020-05-29 15:47:08 -05:00
mehrdadn
cb91fe2fc4
SetErrorMode for all Ray processes ( #8656 )
2020-05-29 10:18:20 -05:00
fangfengbin
35eeec5647
Add C++ global state for actor table ( #8501 )
...
* add global state actors
* fix code style
* fix GcsActorManagerTest bug
* rebase master
* add jni code
* add get checkpoint id code
* add debug code
* add debug code
* change log level
* fix compile bug
* return null in jni
* fix crash bug
* change import seq
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-05-29 21:10:42 +08:00
Hao Chen
08fee00bc8
Increase rayelt client connect timeout to fix test_debug_tools ( #8605 )
2020-05-28 20:57:30 +08:00
Lingxuan Zuo
e594524ed3
[GCS] global state query node info table from GCS. ( #8498 )
2020-05-28 16:39:13 +08:00
Tao Wang
675ccbc799
Resubscribe worker table info when gcs service restart ( #8606 )
2020-05-28 10:27:38 +08:00
Edward Oakes
442ada0fcd
Remove shutdown prints to the console ( #8626 )
2020-05-27 10:52:31 -05:00
Lingxuan Zuo
bd4fbcd7fc
Global state accessor jni ( #8637 )
2020-05-27 17:43:47 +08:00
Tao Wang
a1298686d7
[TEST]Use manager class to start/stop components instead of spreading duplicated codes everywhere ( #8500 )
2020-05-27 16:51:51 +08:00
fangfengbin
b0cf781152
fix resubscribe miss callback index bug ( #8604 )
2020-05-27 11:55:17 +08:00
fangfengbin
99dd6a581d
fix testActorRestart failure bug ( #8613 )
2020-05-27 11:10:45 +08:00
fangfengbin
01f4a6eca0
Add task table subscribe retry when gcs service restart ( #8601 )
2020-05-26 17:47:03 +08:00
fangfengbin
c41976938d
Add node table subscribe retry when gcs service restart ( #8591 )
2020-05-26 14:42:48 +08:00
Tao Wang
7e5b3dc0d9
GCS server task info handler use storage instead of redis accessor ( #8584 )
2020-05-26 10:38:31 +08:00
fangfengbin
765d470c40
Add gcs object manager ( #8298 )
2020-05-25 17:21:35 +08:00
fangfengbin
f22d12d2fc
fix TestGetUncommittedLineage npe bug ( #8585 )
2020-05-25 15:48:58 +08:00
fangfengbin
229af662c6
Add job table&actor table subscribe retry when gcs service restart ( #8442 )
2020-05-25 14:38:25 +08:00
Tao Wang
92c2e41dfd
[GCS]profile info getting implementation based gcs service ( #8536 )
2020-05-24 22:23:01 +08:00
fangfengbin
2ab1b773d4
GCS server worker info handler use storage instead of redis accessor ( #8543 )
2020-05-23 23:17:36 +08:00
Eric Liang
351839bf69
Revert "GCS server task info handler use storage instead of redis accessor ( #8531 )" ( #8562 )
...
This reverts commit 9823e15311
.
2020-05-22 19:16:43 -07:00
Kai Yang
2e5e789294
Allow enabling logging in core worker with empty log_dir ( #8529 )
2020-05-22 18:02:37 +08:00
fangfengbin
9823e15311
GCS server task info handler use storage instead of redis accessor ( #8531 )
2020-05-22 12:04:03 +08:00
Eric Liang
bb8d3c5cd0
ASAN build for ray core tests ( #8431 )
2020-05-21 15:11:03 -07:00
Edward Oakes
a76434ccde
Add ability to specify worker and driver ports ( #8071 )
2020-05-20 15:31:13 -05:00
mehrdadn
ebf060d484
Make more tests run on Windows ( #8446 )
...
* Remove worker Wait() call due to SIGCHLD being ignored
* Port _pid_alive to Windows
* Show PID as well as TID in glog
* Update TensorFlow version for Python 3.8 on Windows
* Handle missing Pillow on Windows
* Work around dm-tree PermissionError on Windows
* Fix some lint errors on Windows with Python 3.8
* Simplify torch requirements
* Quiet git clean
* Handle finalizer issues
* Exit with the signal number
* Get rid of wget
* Fix some Windows compatibility issues with tests
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Lingxuan Zuo
cd706f40c4
[Stats] add nodeaddress tag for stats test ( #8423 )
2020-05-20 12:30:01 -05:00
Max Fitton
0fadc11437
[dashboard] Only show workers from the correct cluster ( #8434 )
2020-05-18 13:30:41 -05:00
fangfengbin
9347a5d10c
Add global state accessor of jobs ( #8401 )
2020-05-18 20:32:05 +08:00
Edward Oakes
16f48078d9
Remove use of ObjectID transport flag ( #7699 )
2020-05-17 11:29:49 -05:00
Tao Wang
acffdb2349
[TEST]use cc_test to run core_worker_test, enforce/reuse RedisServiceManagerForTest ( #8443 )
2020-05-17 18:43:00 +08:00
Stephanie Wang
bd169749e0
Option to retry failed actor tasks ( #8330 )
...
* Python
* Consolidate state in the direct actor transport, set the caller starts at
* todo
* Remove unused
* Update and unit tests
* Doc
* Remove unused
* doc
* Remove debug
* Update src/ray/core_worker/transport/direct_actor_transport.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/core_worker/transport/direct_actor_transport.cc
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* lint and fix build
* Update
* Fix build
* Fix tests
* Unit test for max_task_retries=0
* Fix java?
* Fix bad test
* Cross language fix
* fix java
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Max Fitton
00325eb2b2
Rename max_reconstructions to max_restarts and use -1 for infinite ( #8274 )
...
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
fangfengbin
08b612052b
Add redis store client AsyncGetAll/AsyncBatchDelete/AsyncDeleteByIndex API ( #8390 )
2020-05-14 14:38:25 +08:00
Hao Chen
a593fde606
Fix core dumps in ExitActor ( #8382 )
2020-05-12 20:06:04 +08:00
fangfengbin
515afa6809
Fix AsyncGetAll miss override bug ( #8402 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-05-11 11:08:16 -05:00