Commit graph

1638 commits

Author SHA1 Message Date
Hao Chen
2912a7cb86
Initial high-level code structure of CoreWorker. (#4875) 2019-05-30 02:43:17 -07:00
Qing Wang
b7c284aaa3
Refactor redis callback handling (#4841)
* Add CallbackReply

* Fix

* fix linting by format.sh

* Fix linting

* Address comments.

* Fix
2019-05-30 11:54:30 +08:00
Yuhong Guo
fa0892f285
Replace ReturnIds with NumReturns in TaskInfo to reduce the size (#4854)
* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine
2019-05-28 13:30:41 +08:00
Yuhong Guo
1a39fee9c6
Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (#4776)
* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict
2019-05-22 14:46:30 +08:00
Qing Wang
081708bdef [Java] Dynamic resource API in Java (#4824) 2019-05-21 17:13:48 +08:00
Stephanie Wang
cb1a195ca2
Queue tasks in the raylet in between async callbacks (#4766)
* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates
2019-05-15 10:23:25 -07:00
Stephanie Wang
1622fc21fc Fatal check if object store dies (#4763) 2019-05-13 11:59:12 -07:00
Romil Bhardwaj
004440f526 Dynamic Custom Resources - create and delete resources (#3742) 2019-05-11 20:06:04 +08:00
ashione
ccc540adf1 Remove mutable data function in id(UniqueID and its subclass) (#4696)
* remove mutable data in jni
fix flatbuffer string to ID check

* replace sizeof(ID) by ID.size()

sizeof(ID) = 20 if no other members in class

* fix new string unbounded

* code polished according to comments

* lazy hash eval
2019-05-09 16:41:48 +08:00
Yuhong Guo
481bfbde58
[c++] Allow RayConfig to have items other then integer (#4701)
* Allow RayConfig to have items other then integer

* Fix a small bug
2019-05-09 11:18:28 +08:00
Romil Bhardwaj
686d4caefe Updates to scheduling objects to support dynamic custom resources (#4465) 2019-04-27 18:45:23 -07:00
Qing Wang
c26f24ab9f Integrate metric items into raylet (#4602) 2019-04-25 11:40:24 +08:00
Qing Wang
f39b6747e5 Refactor command line argument parsing with gflags (#4676) 2019-04-24 14:53:07 +08:00
William Ma
c99e3caaca Change resource bookkeeping to account for machine precision. (#4533) 2019-04-23 11:59:53 -07:00
justinwyang
8dfc833a8b Change all instances of JobID to DriverID. (#4431) 2019-04-22 16:28:09 -07:00
Wang Qing
d951eb740f [Metrics] Add a flag to disable stdout exporter (#4634) 2019-04-19 19:06:30 -07:00
Hao Chen
d52b080081
[Java] Avoid unnecessary memory copy and addd a benchmark (#4611) 2019-04-14 00:17:04 +08:00
Romil Bhardwaj
0f42f87ebc Updating zero capacity resource semantics (#4555) 2019-04-12 16:53:57 -07:00
Wang Qing
fe07a5b4b1 Add delete_creating_tasks option for internal.free() (#4588)
* add delete creating task objects.

* format code style

* Fix lint

* add tests add address comments.

* Refine test

* Refine java test

* Fix CI

* Refine

* Fix lint

* Fix CI
2019-04-12 13:38:31 +08:00
William Ma
4b25810994 Adds a push_id to every push in the object manager (#4407) 2019-04-03 17:12:06 -07:00
Yuhong Guo
c2349cf12d Remove local/global_scheduler from code and doc. (#4549) 2019-04-03 17:05:09 -07:00
Philipp Moritz
b0f6ddf6d1 Remove CMake files (#4493) 2019-04-02 22:17:33 -07:00
Wang Qing
7d776f35e1 Integrate metrics (#4246) 2019-04-02 21:01:02 -07:00
Yuhong Guo
c2c548bdfd Fix broken pipe callback (#4513) 2019-04-02 17:42:18 +08:00
Ruifang Chen
59d74d5e92 [Java] Build Java code with Bazel (#4284) 2019-03-22 14:30:05 +08:00
Ion
59079a799c Signal actor failure (#4196) 2019-03-21 15:17:42 -07:00
Kai Yang
c36d03874b Redis returns OK when removing a non-existent set entry (#4434) 2019-03-21 11:59:15 -07:00
Hao Chen
d03999d01e
Cross-language invocation Part 1: Java calling Python functions and actors (#4166) 2019-03-21 13:34:21 +08:00
Stephanie Wang
4ac9c1ed6e Fix bug in cluster mode where driver exits when there are tasks in the waiting queue (#4251) 2019-03-20 10:18:27 -07:00
Kai Yang
7ff56ce826 Introduce set data structure in GCS (#4199)
* Introduce set data structure in GCS. Change object table to Set instance.

* Fix a logic bug. Update python code.

* lint

* lint again

* Remove CURRENT_VALUE mode

* Remove 'CURRENT_VALUE'

* Add more test cases

* rename has_been_created to subscribed.

* Make `changed` parameter type of `bool *`

* Rename mode to notification_mode

* fix build

* RAY.SET_REMOVE return error if entry doesn't exist

* lint

* Address comments

* lint and fix build
2019-03-11 14:42:58 -07:00
Yuhong Guo
ba3fe04629 Fix message type to string crash (#4308)
* Fix message string crash

* Fix
2019-03-09 13:51:02 -08:00
Stephanie Wang
edc794751f Set TCP_NODELAY on all TCP connections (#4318) 2019-03-09 12:15:29 -08:00
Yuhong Guo
b9ea821d16
Use strongly typed IDs in C++. (#4185)
*  Use strongly typed IDs for C++.

* Avoid heap allocation in cython.

* Fix JNI part

* Fix rebase conflict

* Refine

* Remove type check from __init__

* Remove unused constructor declarations.
2019-03-07 21:43:01 +08:00
Stephanie Wang
0ccaf118a2
Disconnect object manager clients if receiving an object fails (#4141)
* Disconnect object manager clients if ReadBuffer fails

* unused

* put back EINTR handling
2019-03-05 22:08:26 -08:00
Stephanie Wang
8b871af555
Fix ray.wait bug for tasks on remote nodes and timeout=0 (#4242)
* Regression test

* Fix

* cleaner code
2019-03-04 11:46:06 -08:00
Yuhong Guo
6f46edca51 Skip dead nodes to avoid connection timeout. (#4154) 2019-03-02 13:11:19 -08:00
Hao Chen
484708d44d
Fix JNI throwing exception (#4178) 2019-02-28 15:11:25 +08:00
Philipp Moritz
615d5516d1 Compile valgrind tests with Bazel (#4144) 2019-02-24 00:00:49 -08:00
Philipp Moritz
ba52caff37 Make Bazel the default build system (#3898) 2019-02-23 11:58:59 -08:00
Philipp Moritz
9b3ce3e64b Revert inline objects PR (#4125)
* Revert "Inline objects (#3756)"

This reverts commit f987572795.

* fix rebase problems

* more rebase fixes

* add back debug statement
2019-02-22 18:21:01 -08:00
Tianming Xu
692bb336a1 Fix master branch compilation error and lint error (#4109) 2019-02-21 11:54:30 -08:00
Yuhong Guo
3549cd8195
Add the Delete function in GCS (#4081)
* Add the Delete function in GCS

* Unify BatchDelete and Delete

* Fix comment

* Lint

* Refine according to comments

* Unify test.

* Address comment

* C++ lint

* Update ray_redis_module.cc
2019-02-21 13:33:37 +08:00
Hao Chen
de17443dc2
Propagate backend error to worker (#4039) 2019-02-16 11:39:15 +08:00
Stephanie Wang
3684e5bc0d Fix memory leak in Redis by using auto memory management (#4054)
* Table appends should always succeed

* Use Redis auto memory management

* Remove unneeded namespace
2019-02-14 19:51:18 -08:00
Philipp Moritz
810cc17062 Fix LRU eviction of client notification datastructure (#4021)
* convert notification_key map to C++ datastructure

* fix crash and add debug string

* clean notification map up (this was a bug before)

* remove checks

* add jenkins test

* linting

* fixes

* properly erase

* clean up

* linting

* Update test_wait_hanging.py

* Update run_multi_node_tests.sh

* increase redis_max_memory

* fix dat jenkins

* update

* Update run_multi_node_tests.sh
2019-02-13 22:20:27 -08:00
Stephanie Wang
fd5b58a827 Increase timeout for object manager valgrind tests (#4027)
* Avoid second copy of data for inlined objects

* Increase Wait timeout for valgrind tests

* Run object manager tests with and without inlined objects

* Fix test
2019-02-13 18:29:03 -08:00
Stephanie Wang
4347ab644e
Use Redis lists in the GCS instead of zset (#4023)
* Convert zset to list

* Remove object evictions map from the object directory, yay

* comments

* Fix tests
2019-02-13 10:32:57 -08:00
Hao Chen
f31a79f3f7
Implement actor checkpointing (#3839)
* Implement Actor checkpointing

* docs

* fix

* fix

* fix

* move restore-from-checkpoint to HandleActorStateTransition

* Revert "move restore-from-checkpoint to HandleActorStateTransition"

This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.

* resubmit waiting tasks when actor frontier restored

* add doc about num_actor_checkpoints_to_keep=1

* add num_actor_checkpoints_to_keep to Cython

* add checkpoint_expired api

* check if actor class is abstract

* change checkpoint_ids to long string

* implement java

* Refactor to delay actor creation publish until checkpoint is resumed

* debug, lint

* Erase from checkpoints to restore if task fails

* fix lint

* update comments

* avoid duplicated actor notification log

* fix unintended change

* add actor_id to checkpoint_expired

* small java updates

* make checkpoint info per actor

* lint

* Remove logging

* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager

* Replace old actor checkpointing tests

* Fix test and lint

* address comments

* consolidate kill_actor

* Remove __ray_checkpoint__

* fix non-ascii char

* Loosen test checks

* fix java

* fix sphinx-build
2019-02-13 19:39:02 +08:00
Zhijun Fu
7097ba393b protect raylet against bad messages (#4003)
* protect raylet against bad messages

* address comments

* linting and regression test
2019-02-12 00:39:38 +08:00
Yuhong Guo
5fb1efd60d Fix CI test failures (#4007) 2019-02-11 11:01:14 +08:00