Commit graph

1638 commits

Author SHA1 Message Date
Stephanie Wang
35d177f459
Use grpc for communication from worker to local raylet (task submission and direct actor args only) (#6118)
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Edward Oakes
5780ec1b62
Refresh ObjectIDs in raylet for stopgap GC (#6109) 2019-11-10 23:12:59 -08:00
Philipp Moritz
ccbcc4bafa
Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
Philipp Moritz
5a05eaaa54 Fix compilation on master (#6116) 2019-11-07 22:38:42 -08:00
Eric Liang
4a28306186
Allow large returns from direct actor calls (#6088) 2019-11-07 21:28:55 -08:00
Edward Oakes
ca53af4d0f
Add pending task dependencies to ObjectID ref counting (#6054) 2019-11-07 18:37:10 -08:00
Edward Oakes
9820c10a09 Simplify gRPC service definition for the worker (#6095) 2019-11-06 13:00:39 -08:00
mehrdadn
e312f3d282 Compatibility issues (#6071)
* Pass -f - to tar to force stdin on Windows

* Quote paths that may contain spaces (causes issues on Windows)

* Copy over Windows code from Arrow for glog signal handle uninstall

* Add missing COPTS to build rules since we'll need them for Windows compatibility

* Begin adding COPTS for Windows compatibility

* Disable glog on Arrow until we change WIN32 to _WIN32 there

* Missing header files that cause problems on Windows

* WORD typedef conflicts with Windows; remove it

* uint -> unsigned int wherever we're dealing with milliseconds (signed version is already int)

* uint -> unsigned int for enums

* uint -> size_t, wherever we're dealing with sizes or indices into arrays

* Work around Boost 1.68 bug in detecting clang-cl (revert this after upgrading)

* Missing #include <unistd.h>

* Add check for signal handler uninstallation failure

* Linting issue
2019-11-05 00:08:14 -08:00
Edward Oakes
043d1f4094 Return RayObjects to core worker (#6052) 2019-11-04 20:27:57 -08:00
Eric Liang
8485304e83
Support concurrent Actor calls in Ray (#6053) 2019-11-04 01:14:35 -08:00
Philipp Moritz
1c5446851a
Use Plasma with LRU refreshing integrated (#6050) 2019-11-03 16:19:05 -08:00
Eric Liang
fb34928a2a
[minor] Perf optimizations for direct actor task submission (#6044)
* merge optimizations

* fix

* fix memory err

* optimize

* fix tests

* fix serialization of method handles

* document weakref

* fix check

* bazel format

* disable on 2
2019-11-01 14:41:14 -07:00
Eric Liang
eef4ad3bba
Report census view data as part of raylet node stats (#6060) 2019-11-01 14:26:09 -07:00
Simon Mo
7f5b3502da
Implement Detached Actor (#6036)
* Arg propagation works

* Implement persistent actor

* Add doc

* Initialize is_persistent_

* Rename persistent->detached

* Address comment

* Make test passes

* Address comment

* Python2 compatiblity

* Fix naming, py2

* Lint
2019-11-01 10:28:23 -07:00
Eric Liang
c86f945520
Support pass by ref args in for direct actor calls (#6040) 2019-10-31 16:55:10 -07:00
Edward Oakes
16e9dfd2e1
Exit workers when raylet dies unexpectedly (#6014) 2019-10-30 20:29:25 -07:00
Eric Liang
8ebba202df
[minor] Reduce perf overhead of object ref tracking (#6041) 2019-10-29 18:14:51 -07:00
Eric Liang
b89cac976a
Basic direct actor call support in Python (#5991) 2019-10-28 22:09:04 -07:00
Edward Oakes
c1418b04df Remove CoreWorkerObjectInterface (#6023) 2019-10-28 10:48:41 -07:00
Philipp Moritz
80c01617a3
Optimize python task execution (#6024) 2019-10-27 00:43:34 -07:00
Stephanie Wang
eb41c945a1 Add gRPC endpoint to raylet to expose metrics (#6005) 2019-10-26 16:37:39 -07:00
Eric Liang
a5523466a2
Enable memstore by default (#6003) 2019-10-25 21:59:12 -07:00
Edward Oakes
d4055d70e3
Remove CoreWorkerTaskExecutionInterface (#6009) 2019-10-25 16:33:44 -07:00
Edward Oakes
e6141a0b8b
Remove UsePush logic from raylet (#6015) 2019-10-25 14:52:19 -07:00
Edward Oakes
1ce521a7f3
Remove task context from python worker (#5987)
Removes duplicated state between the python and C++ workers. Also cleans up the serialization codepaths a bit.
2019-10-25 07:38:33 -07:00
Philipp Moritz
09d05bb3fa
Reduce actor submission python overhead (#5949) 2019-10-23 00:11:32 -07:00
Edward Oakes
02931e08f3
[core worker] Python core worker task execution (#5783)
Executes tasks via the event loop in the C++ core worker. Also properly handles signals (including KeyboardInterrupt), so ctrl-C in a python interactive shell works now (if connecting to an existing cluster).
2019-10-22 20:15:59 -07:00
Philipp Moritz
b6e7ed20ce
Fix random numbers on linux wheel build (#5975) 2019-10-22 17:52:12 -07:00
Edward Oakes
fc56872012
Send active object IDs to the raylet (#5803)
* Send active object IDs to the raylet

* comment

* comments

* dedup

* signed int in config

* comments

* Remove object ID from monitor

* Fix test

* re-add check

* fix cast

* check if core worker

* Add comment

* Reservoir sampling

* Fix lint

* Pointer return

* tmp

* Fix merge

* Initialize object ids properly

* Fix lint
2019-10-20 22:05:28 -07:00
Stephanie Wang
bc4a0de4da
Fix multiple drivers for named actors and add test (#5956) 2019-10-20 16:04:21 -07:00
Stephanie Wang
697f765efc
Refactor CoreWorker to remove TaskInterface (#5924)
* Remove TaskInterface

* Remove Status return value

* Remove CActorHandle, some return values, TaskSubmitter

* lint

* doc

* doc

* fix build

* lint

* Return Status, guarded by annotation, fail tasks for RECONSTRUCTING actors

* fix

* move annotation

* revert

* Fix core worker test

* nits
2019-10-18 00:03:57 -04:00
Stephanie Wang
3ac8592dcf
Remove actor handle IDs (#5889)
* Remove actor handle ID from main ActorHandle constructor

* Set the actor caller ID when calling submit task instead of in the actor handle

* Remove ActorHandle::Fork, remove actor handle ID from protobuf

* Make inner actor handle const, remove new_actor_handles

* Move caller ID into the common task spec, start refactoring raylet

* Some fixes for forking actor handles

* Store ActorHandle state in CoreWorker, only expose actor ID to Python

* Remove some unused fields

* lint

* doc

* fix merge

* Remove ActorHandleID from python/cpp

* doc

* Fix core worker test

* Move actor table subscription to CoreWorker, reset actor handles on actor failure

* lint

* Remove GCS client from direct actor

* fix tests

* Fix

* Fix tests for raylet codepath

* Fix local mode

* Fix multithreaded test

* Fix AsyncSubscribe issue...

* doc

* fix serve

* Revert bazel
2019-10-17 12:36:34 -04:00
Richard Liaw
20c0cdee4f
[autoscaler] Worker-Head termination + Better Scale-up message (#5909) 2019-10-14 10:37:50 -07:00
Edward Oakes
abbfe7392f
Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
Philipp Moritz
1100556ba2
Fix linux wheel build (#5881) 2019-10-10 16:15:26 -07:00
Eric Liang
1a8ac3db46
Implement fair task queueing to prevent task starvation (#5851)
* initial commit

* lint

* clarify

* add feature flag

* comment

* add timeout to test

* fix print

* comment

* use id for scheduling class

* lint

* dad warn

* flake
2019-10-08 21:04:25 -07:00
Edward Oakes
08e4e3a153
[core worker] Submit Python actor tasks through core worker (#5750)
* Submit actor tasks through core worker

* Fix java

* add comment

* Remove task builder

* Check negative

* Increase -> Increment

* pass by reference

* fix signal

* Clean up c++ actor handle

* more cleanup

* Clean up headers

* Fix unique_ptr construction

* Fix java

* Move profiling to c++

* dedup

* fix error

* comments

* fix java

* Fix tests

* wait for actor to exit

* Start after constructor

* ignore java build

* fix comment

* always init logging

* Fix logging

* fix logging issue

* shared_ptr for profiler

* DEBUG -> WARNING

* fix killed_ init

* Fix flaky checkpointing tests

* -v flag for tune tests

* Fix checkpoint test logic

* Fix exception matching

* timeout exception

* Fix test exception info

* Fix import

* fix build

* Fix test

* shared_ptr
2019-10-07 15:42:19 -07:00
Edward Oakes
17c6835c3f
Just die on signal (#5842) 2019-10-03 18:21:21 -07:00
Si-Yuan
2fb7d7846f Initial implementation of Cython pickle5 support (#5725) 2019-10-03 09:20:26 -07:00
Edward Oakes
4e049232a8
shared_ptr (#5830) 2019-10-02 16:29:04 -07:00
Edward Oakes
963bbe8bbd
Move profiling to c++ (#5771)
* Move profiling to c++

* comments

* Fix tests

* Start after constructor

* fix comment

* always init logging

* Fix logging

* fix logging issue

* shared_ptr for profiler

* DEBUG -> WARNING

* fix killed_ init

* Fix flaky checkpointing tests

* Fix checkpoint test logic

* Fix exception matching

* timeout exception

* Fix import

* fix build

* use boost::asio

* fix double const

* Properly reset async_wait

* remove SIGINT

* Change error message

* increase timeout

* small nits

* Don't trap on SIGINT

* -v for tune

* Fix test
2019-10-01 10:06:25 -07:00
Edward Oakes
86610a30c9
[flaky test] Fix flaky checkpointing tests (#5791)
* Fix flaky checkpointing tests

* Fix checkpoint test logic

* Fix exception matching

* timeout exception

* Fix import

* fix build
2019-09-27 11:03:07 -07:00
Eric Liang
b5da32df78 Bump Ray version in documentation to dev5 (#5794) 2019-09-27 00:19:17 -07:00
Edward Oakes
8a33891a40
Include object size in full error (#5782) 2019-09-25 17:04:17 -07:00
Zhijun Fu
ea9376c9ce Fix flaky core worker tests because of race condition in gcs client subscription (#5735) 2019-09-24 22:47:38 +08:00
Edward Oakes
61e5d674be
Push driver task in core worker (#5752) 2019-09-23 10:53:55 -05:00
Philipp Moritz
f4deecb5ab Fix travis error in direct_actor_transport.cc (#5710) 2019-09-15 22:19:20 -07:00
Eric Liang
4bf7de084d Speed up TaskSpecification copy (#5709) 2019-09-15 19:57:34 -07:00
Eric Liang
4979b8c4d9
Ordered execution of tasks per actor handle (#5664) 2019-09-14 22:31:33 -07:00
Edward Oakes
a5d7de6aaf [core worker] Python core worker normal task submission (#5566) 2019-09-14 13:02:53 -07:00