Commit graph

1199 commits

Author SHA1 Message Date
Simon Mo
3ba8680963 Bump version to 0.8.0.dev3 (#5308) 2019-07-29 18:28:38 -07:00
Simon Mo
3b00144e7d Bump version to 0.7.3 (#5301) 2019-07-29 10:25:32 -07:00
Qing Wang
1465a30ea9
Fix releasing CPUs incorrectly when actor creation task blocked. (#5271)
* Fix

* Remove useless log

* Address

* Fix typo

* sleep
2019-07-28 15:46:17 +08:00
micafan
6f682db99d avoid copying ActorTableData when NodeMananger updates an actor to GCS (#5244) 2019-07-26 11:17:24 +08:00
Joey Jiang
40395acadf [gRPC] Migrate raylet client implementation to grpc (#5120) 2019-07-25 14:48:56 +08:00
Eric Liang
5b76238bce
Fix two types of eviction hangs (#5225) 2019-07-23 21:20:17 -07:00
Stephanie Wang
15959b0f0d
Leave ray.wait calls open until the task or actor exits (#5234)
* Regression test

* Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies
- Some initial implementation

* unit test

* Improve unit tests for TaskDependencyManager

* Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing

* Add ray.wait python test for drivers that exit early

* Add WorkerID to Worker

* Update test to use two nodes

* Regression test for ray.wait passes

* Extend regression test to include ray.wait from an actor

* Fix ClientID and WorkerIDs

* lint

* lint

* Remove unnecessary ray_get argument

* fix build
2019-07-23 11:55:28 -07:00
Qing Wang
a3d4f9f16d
Fix the issue when passing multiple options in one string (#5241)
* Fix

* Fix linting

* Fix linting

* Address

* Fix test
2019-07-23 12:28:54 +08:00
Zhijun Fu
aa42328874 [direct call] add local plasma provider (#5184) 2019-07-19 11:29:12 +08:00
micafan
b5b8c1d361 [GCS] introduce new gcs client and refactor actor table (#5058) 2019-07-19 11:28:34 +08:00
Richard Liaw
3e0ad11ae0
Add heartbeat test + Fix monitor.py (#5191) 2019-07-16 21:59:48 -07:00
Kai Yang
806524384b [Java worker] Refactor object store and worker context on top of core worker (#5079) 2019-07-16 20:58:02 +08:00
Edward Oakes
e5be5fd46d Remove dependencies from TaskExecutionSpecification (#5166) 2019-07-15 18:15:21 -07:00
Hao Chen
ea6aa6409a Reconstruct failed actors without sending tasks. (#5161)
* fast reconstruct dead actors

* add test

* fix typos

* remove debug print

* small fix

* fix typos

* Update test_actor.py
2019-07-15 10:25:09 -07:00
Hao Chen
7342117710
Fix a multithreading bug in grpc ClientCall (#5196) 2019-07-15 14:49:53 +08:00
Philipp Moritz
322b5166ad Update arrow to include user defined status for plasma (#5156) 2019-07-12 22:51:14 -07:00
Hao Chen
f5a87b88a3 Fix: ServerCallFactory's destructor not marked as virtual (#5185) 2019-07-13 09:38:47 +08:00
Stephanie Wang
f46c555e9e Only get actor ID if actor task (#5180) 2019-07-12 14:31:21 +08:00
vipulharsh
3b42d5ccb1 Track newly created actor's parent actor (#5098)
* Track parent actor of actor

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* fixing a comment

* Fixing typo in a comment

* capturing task_spec instead of actor_data

* adding const for some local variables

* changing an if else to else

* Linted version

* use updated method to create task from task_data

Change-Id: I9c1a65134dc23a2d175047e96b86ab9d9cf61971

* fixing linter issues

Change-Id: I1def06218130b399d2527b999258aecf9abb98dd
2019-07-11 14:52:04 -07:00
Philipp Moritz
ccee77aafd fix node_failures.py (#5167) 2019-07-11 11:40:13 -07:00
Zhijun Fu
1649f1370e [direct call] changes raylet to push tasks to worker (#5140)
* refactor grpc server

* format

* change GetTask() to PushTask()

* change PushTask to AssignTask

* format

* add resource_ids

* move done_callback to server call

* remove SetTaskHandler and initialize it in task receiver's constructor

* format

* resolve comments

* update

* update

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* resolve comments

* format

* Update src/ray/core_worker/transport/raylet_transport.cc

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* resolve comments

* resolve comments

* fix build

* format

* fix

* format

* noop
2019-07-11 11:01:32 -07:00
Hao Chen
fd835d107e
Move task to common module and add checks in getter methods (#5147) 2019-07-11 17:07:04 +08:00
Qing Wang
f2293243cc
[ID Refactor] Shorten the length of JobID to 4 bytes (#5110)
* WIP

* Fix

* Add jobid test

* Fix

* Add python part

* Fix

* Fix tes

* Remove TODOs

* Fix C++ tests

* Lint

* Fix

* Fix exporting functions in multiple ray.init

* Fix java test

* Fix lint

* Fix linting

* Address comments.

* FIx

* Address and fix linting

* Refine and fix

* Fix

* address

* Address comments.

* Fix linting

* Fix

* Address

* Address comments.

* Address

* Address

* Fix

* Fix

* Fix

* Fix lint

* Fix

* Fix linting

* Address comments.

* Fix linting

* Address comments.

* Fix linting

* address comments.

* Fix
2019-07-11 14:25:16 +08:00
Kai Yang
43b6513d19 [GCS] Move node resource info from client table to resource table (#5050) 2019-07-11 13:17:19 +08:00
Philipp Moritz
e6a81d40a5 [stability] Make task result for RemoveTask optional (#5146)
* make task result for RemoveTask optional

* lint

* update

* update

* update

* rename

* lint
2019-07-10 13:33:41 -07:00
Joey Jiang
e55c8ca165 Fix crash because of the reference to deleted variable in grpc server call (#5158) 2019-07-10 14:06:21 +08:00
Joey Jiang
5733690aa6 Add success and fail callback of grpc sending reply (#5141) 2019-07-09 17:03:57 +08:00
Hao Chen
8a30b93e42
Define common data structures with protobuf. (#5121) 2019-07-08 22:41:37 +08:00
Joey Jiang
274233962f Remove unused connection file in object manager (#5123) 2019-07-08 10:59:36 +08:00
Philipp Moritz
c5253cc300 Add job table to state API (#5076) 2019-07-06 00:05:48 -07:00
Zhijun Fu
54d5969cea [grpc] Add grpc server to worker (#5054)
* refactor grpc server

* format

* change GetTask() to PushTask()

* change PushTask to AssignTask

* format

* update

* fix test

* format

* Update src/ray/rpc/worker_client.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update BUILD.bazel

* Update src/ray/core_worker/task_execution.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* update

* format

* address comments

* format

* Update src/ray/rpc/worker/worker_server.h

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/protobuf/worker.proto

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* format

* fix

* format
2019-07-04 20:16:42 +08:00
Stephanie Wang
71d4637b75
[core worker] Refactor CoreWorker member classes (#5062)
* Move store client mutex inside CoreWorkerPlasmaStoreProvider

* Move PlasmaClient inside CoreWorkerStoreProvider

* Remove CoreWorkerObjectInterface's ref to CoreWorker

* Remove WorkerLanguage

* Remove CoreWorkerTaskInterface's ref to CoreWorker

* Remove CoreWorkerTaskExecutionInterface's ref to CoreWorker

* lint

* move comment

* Fix build

* Fix build
2019-07-02 15:30:30 -07:00
Kai Yang
1cf7728f35 [Core worker] Serialize ActorHandle in core worker. Make ActorHandle thread safe. (#5034)
* Serialize ActorHandle in core worker. Make ActorHandle thread safe.

* Address comments

* Address comments

* Address comments

* Address comments

* lint

* Address comments

* Address comments

* Address comments

* Address comments

* Minor update

* Address comments

* lint
2019-07-02 16:48:43 +08:00
Qing Wang
247f95b3ff
Refine RegisterClientRequest message to make it clearer. (#5057)
* transfor driver task id Explicitly

* Refins

* Fix and add comment.

* add more

* Fix

* Fix

* Add comments

* Fix
2019-07-02 14:26:19 +08:00
Simon Mo
6c4c1d444d Update VersionKey in stats (#5070) 2019-06-30 18:23:12 +08:00
Kai Yang
4ccb7b05cc [Core worker] Add metadata support in object interface (#5031) 2019-06-28 11:35:03 -07:00
Hao Chen
cefbb0c94c
Fix driver id in TaskInfo (#5055) 2019-06-28 12:56:48 +08:00
Kai Yang
a39982e676 [Core worker] Task execution passes TaskInfo struct to executor (#5032) 2019-06-28 10:59:45 +08:00
Joey Jiang
d6bbbdef35 Use gRPC to handle communication and data transmission between object manager (#4996) 2019-06-28 10:56:34 +08:00
Qing Wang
62e4b591e3
[ID Refactor] Rename DriverID to JobID (#5004)
* WIP

WIP

WIP

Rename Driver -> Job

Fix complition

Fix

Rename in Java

In py

WIP

Fix

WIP

Fix

Fix test

Fix

Fix C++ linting

Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Address comments

* Fix

* Fix CI

* Fix cpp linting

* Fix py lint

* FIx

* Address comments and fix

* Address comments

* Address

* Fix import_threading
2019-06-28 00:44:51 +08:00
Hao Chen
469ae41013
Fix memory leak in rpc ServerCall and ClientCall (#5046) 2019-06-27 13:19:47 +08:00
Stephanie Wang
1a8d0af814
Remove debug check for uncommitted lineage (#5038) 2019-06-26 11:21:00 -07:00
Zhijun Fu
bb8e75b532 [grpc] refactor rpc server to support multiple io services (#5023) 2019-06-25 19:08:09 -07:00
Hao Chen
0131353d42 [gRPC] Migrate gcs data structures to protobuf (#5024) 2019-06-25 14:31:19 -07:00
Qing Wang
e33d0eac68
Add dynamic worker options for worker command. (#4970)
* Add fields for fbs

* WIP

* Fix complition errors

* Add java part

* FIx

* Fix

* Fix

* Fix lint

* Refine API

* address comments and add test

* Fix

* Address comment.

* Address comments.

* Fix linting

* Refine

* Fix lint

* WIP: address comment.

* Fix java

* Fix py

* Refin

* Fix

* Fix

* Fix linting

* Fix lint

* Address comments

* WIP

* Fix

* Fix

* minor refine

* Fix lint

* Fix raylet test.

* Fix lint

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix test.

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix

* Fix lint

* Fix lint

* Fix

* Address comments.

* Fix linting
2019-06-23 18:08:33 +08:00
Hao Chen
2bf92e02e2
[gRPC] Use gRPC for inter-node-manager communication (#4968) 2019-06-17 19:00:50 +08:00
Qing Wang
b08765a08b Fix a crash when unknown worker registering to raylet (#4992) 2019-06-17 13:34:23 +08:00
Zhijun Fu
37abdb283f [Core worker] add store & task provider (#4966) 2019-06-14 18:35:32 +08:00
Hao Chen
3c92b2ee4d
Upgrade CI clang-format to 6.0 (#4976) 2019-06-14 14:52:32 +08:00
Stephanie Wang
89ca5eeb29 Flush all tasks from local lineage cache after a node failure (#4964) 2019-06-12 11:13:39 -07:00