Commit graph

1910 commits

Author SHA1 Message Date
Edward Oakes
580314bf81
Fix Ctrl-C hanging in in-memory store ray.get/ray.wait (#7033) 2020-02-05 10:17:22 -08:00
fangfengbin
ade7ebfc0c
Add service based gcs client (#6686) 2020-02-05 12:06:25 +08:00
Edward Oakes
844f607c93
Collect contained ObjectIDs during deserialization (#7029) 2020-02-03 22:49:14 -08:00
Edward Oakes
984490d2be
Collect object IDs during serialization (#6946) 2020-02-03 18:38:11 -08:00
Edward Oakes
77436c2e32
Use getppid() to check if the raylet has failed (#6963) 2020-02-02 22:05:21 -08:00
Edward Oakes
92525f35d1
Remove raylet client from Python worker (#6018) 2020-01-31 18:23:01 -08:00
Edward Oakes
341a921d81
Remove vanilla pickle serialization for task arguments (#6948) 2020-01-31 16:52:43 -08:00
Simon Mo
396d7fafc8
UI improvement for asyncio (#6905) 2020-01-27 12:45:51 -08:00
mehrdadn
bde575b8dd Revert "Use Boost.Process instead of pid_t (#6510)" (#6909)
This reverts commit fb8e3615d5.
2020-01-26 10:26:44 -06:00
Yunzhi Zhang
0834bda8c1 [Dashboard] Display actor task execution info (#6705)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-01-22 22:33:55 -08:00
Simon Mo
5f527816fe
Fix async actor high cpu utilization when idle (#6877) 2020-01-22 16:07:08 -08:00
mehrdadn
139bf8908e Replace UNIX sockets with TCP sockets in Ray on Windows (#6823)
* Replace UNIX sockets with TCP sockets in Ray
2020-01-20 17:28:11 -08:00
Stephanie Wang
815cd0e39a
Task and actor fate sharing with the owner process (#6818)
* Add test

* Kill workers leased by failed workers

* merge

* shorten test

* Add node failure test case

* Fix FromBinary for nil IDs, add assertions

* Test

* Fate sharing on node removal, fix owner address bug

* lint

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>

* fix

* Remove unneeded test

* fix IDs

Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2020-01-20 16:44:04 -08:00
Yunzhi Zhang
3acf3c7675 [Dashboard] Add actor task counter (#6820) 2020-01-17 15:43:56 -08:00
Zhijun Fu
92380dd4e6 Fix crash in HandleObjectMissing when direct actor creation task is not found in local_queues_ (#6817) 2020-01-17 13:29:13 -06:00
micafan
e143f85ca0 [GCS] Use new interface class GcsClient in ray (#6805) 2020-01-17 14:51:18 +08:00
fangfengbin
e5ad4e6f8d Add worker info handler to gcs service (#6798)
* add worker info handler

* rebase master

* add log

* remove unused variable

* fix code style
2020-01-16 22:35:00 +08:00
mehrdadn
fb8e3615d5 Use Boost.Process instead of pid_t (#6510)
* Use Boost.Process instead of pid_t

This will let us handle child processes (mostly) uniformly across platforms.
TODO: There is no SIGTERM on Windows; achieving something equivalent is fairly involved.
2020-01-15 20:05:02 -08:00
fangfengbin
f9fa93eaf1 Add error info handler to gcs service (#6754)
* add error info accessor

* rebase master

* add function comments

* capture type instead of request
2020-01-16 11:59:00 +08:00
micafan
2e972e725a [GCS] Add WorkerInfoAccessor to GCS Client (#6788) 2020-01-16 09:28:32 +08:00
micafan
a0dc02042b [GCS] Add ErrorInfoAccessor to GCS Client (#6749) 2020-01-15 13:38:58 +08:00
Kai Yang
ddd4c42fe5 [Java] Add killActor API in Java (#6728)
* Add killActor API in Java

* fix javadoc

* update test case

* Address comments
2020-01-14 17:12:00 +08:00
Edward Oakes
a950e95c7d
Use exit() in __kill_actor__ (#6760) 2020-01-13 11:37:59 -06:00
Stephanie Wang
453a214571
Publish unexpected worker failures (#6734)
* Publish unexpected worker failures

* comment

* Move failure handler to NodeManager, update comments
2020-01-11 10:39:22 -08:00
fangfengbin
ed8b2a9b85 Add stats handler to gcs server (#6750) 2020-01-11 12:46:24 +08:00
micafan
ce8170db99 [GCS] Add StatsInfoAccessor to GCS Client (#6748) 2020-01-10 13:55:10 +08:00
fangfengbin
ca454c5c1b Add task reconstruction function to task info handler (#6711) 2020-01-09 15:37:42 +08:00
Yunzhi Zhang
3673835f30 Fix spurious warning message when submitting many tasks (#6752) 2020-01-08 22:52:46 -08:00
micafan
1211e6a1fc [GCS] Add async register nodes to GCS Client (#6742) 2020-01-09 10:51:22 +08:00
Eric Liang
a745886242
Disable HTTP proxy for gRPC connections (#6744)
* disable http proxy for grpc

* add test
2020-01-08 09:23:22 -08:00
micafan
0a5d0109a4 add actor table data creation method to pb_util.h (#6746) 2020-01-08 22:39:17 +08:00
micafan
91a3fa0157 [GCS]access task reconstruction in TaskInfoAccessor (#6688)
* add task lease interface to TaskInfoAccessor

* impl of task lease

* support accessing task lease in TaskInfoAccessor

* update raylet usage of task lease

* add comment

* fix lint

* fix UT of TaskDependencyManager

* fix UT of ReconstructionPolicy

* rm useless code from UT

* add task reconstruction methods to gcs

* fix ut of RedisGcsClient

* update test

* update comments
2020-01-08 16:59:06 +08:00
fangfengbin
303d1a959b Add task lease method to task info handler (#6710)
* add task lease methods to task info handler

* rebase master
2020-01-08 14:25:55 +08:00
Zhijun Fu
329b9440ba fix missing override for HandleWaitForObjectEviction (#6733) 2020-01-07 13:20:35 -08:00
Zhijun Fu
72335dbe46
[rpc] refactor RPC server code (#6661)
* refactor RPC client

* remove unused code

* format

* fix

* resolve comments

* format

* update

* refactor rpc server

* update

* format

* fix

* fix

* Update src/ray/rpc/worker/core_worker_server.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* resolve comments

* format

* update

* update

* add a comment

* fix

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-07 22:03:42 +08:00
Edward Oakes
2a4d2c6e9e
Basic reference counting & pinning (#6554) 2020-01-06 17:30:26 -06:00
mehrdadn
c9855c9769 Remove std::move<std::shared_ptr>(...) to avoid bugs (#6720) 2020-01-06 17:17:26 -06:00
Zhijun Fu
5bb20f6ac9
remove unused params in grpc macros (#6677)
* remove unused params in grpc macros

* format

* fix

* format

* fix
2020-01-06 21:35:40 +08:00
mehrdadn
76c986bdc7 Windows compatibility stubs (#6706) 2020-01-05 21:21:17 -08:00
mehrdadn
e6165cb14b Fix master as it seems to have been broken via these conflicting commits: (#6708)
c51fbfb453
2228079481

Co-authored-by: GitHub Web Flow <noreply@github.com>
2020-01-06 12:29:21 +08:00
fangfengbin
1000e3322d Add gcs server task info handler (#6695) 2020-01-06 11:09:32 +08:00
mehrdadn
2228079481 Fix missing overrides (#6703) 2020-01-05 17:00:23 -08:00
Philipp Moritz
e15bd8ff1a
Run core worker tests in thread sanitizer and fix thread safety issues (#6701) 2020-01-05 16:18:21 -08:00
micafan
cc110ff1e3 [GCS]Add task lease methods to TaskInfoAccessor (#6645) 2020-01-05 13:54:33 +08:00
Yunzhi Zhang
816b84808d [Dashboard] Display memory usage of nodes and core workers (#6671) 2020-01-03 20:12:42 -08:00
micafan
fd379934b6 rm DirectActorTable (#6684) 2020-01-03 16:28:26 -08:00
fangfengbin
b8669bc06c Add node resources methods to gcs server node info handler (#6685) 2020-01-03 20:06:49 +08:00
micafan
970cd78701 [GCS] refactor the GCS Client Dynamic Resource Interface (#6266) 2020-01-03 14:07:37 +08:00
Simon Mo
9fe90cdafc
Fix async actor recursion limitation (#6672)
* Do not start threadpool when using async

* Turn function_executor into a generator

* Add new test for high concurrency and bump the default

* Set direct call
2020-01-02 19:45:13 -06:00
Yunzhi Zhang
8a0a30b5f0 [Dashboard] display actor status and infeasible tasks (#6652)
* expose actor status and protobuf message of infeasible tasks

* move infeasible tasks into actor tree

* add pytest for displaying infeasible tasks info

* fix base64 decoding

* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00