Commit graph

1638 commits

Author SHA1 Message Date
Robert Nishihara
2f750e9ba7 Add parentheses around one-line if statement. (#1318) 2017-12-13 23:48:53 -08:00
Robert Nishihara
f75b51d178 Register Common.error with local scheduler extension module. (#1316)
* Register Common.error with local scheduler extension module.

* Add test.
2017-12-13 11:55:54 -08:00
Stephanie Wang
bac39a134e
Define a wrapper class for callback_data.data (#1301) 2017-12-08 11:48:21 -08:00
Stephanie Wang
044548bcff Mark the killed as done outside of loop (#1284) 2017-12-02 14:42:16 -08:00
Robert Nishihara
c21e189371 Allow scheduling with arbitrary user-defined resource labels. (#1236)
* Enable scheduling with custom resource labels.

* Fix.

* Minor fixes and ref counting fix.

* Linting

* Use .data() instead of .c_str().

* Fix linting.

* Fix ResourcesTest.testGPUIDs test by waiting for workers to start up.

* Sleep in test so that all tasks are submitted before any completes.
2017-12-01 11:41:40 -08:00
Robert Nishihara
e0a340ee7e Allow actors to pin at most 1000 dummy objects at a time. (#1241)
* Allow actors to pin at most 1000 dummy objects at a time.

* Fix linting.
2017-11-22 13:38:01 -08:00
Eric Liang
9233e496cc Raise exception when getting the task results of workers that died (#1224)
* wip

* with test

* add timeout

* also add test for f

* remove on cleanup

* update

* wip

* fix tests

* mark actor removed in redis

* clang-format

* fix bug when no-inprogress tasks

* try to set task status done

* Add comment.
2017-11-20 15:18:39 -08:00
Peter Schafhalter
e0360eb429 Remove UT libraries and clean up remaining UT datastructures (#1230)
* Remove UT string include from redis

* Remove UT string include from DB tests

* Modify TaskSpec_print to remove UT string

* Remove UT libraries
2017-11-19 15:01:33 -08:00
Peter Schafhalter
d986294c2b Replace UT strings in local scheduler (#1213)
* Convert to string using std::string

* Fix linting issue

* Fix linting

* Construct db_connect_args using vector

* Use vector size() instead of num_args

* Hopefully fix linting now
2017-11-17 16:14:46 -08:00
Robert Nishihara
94423c0542 Upgrade Arrow with fixes to Plasma eviction policy. (#1228)
* Upgrade Arrow with fixes to Plasma eviction policy.

* Upgrade arrow to have -f flag for plasma store.
2017-11-17 14:41:22 -08:00
Peter Schafhalter
4cbc2b1978 Clean up UT datastructures in Python extension (#1227) 2017-11-17 01:07:12 -08:00
Stephanie Wang
c70430f322 Fix bugs in plasma manager transfer (#1188)
* Plasma client test for plasma abort

* Use ray-project/arrow:abort-objects branch

* Set plasma manager connection cursor to -1 when not in use

* Handle transfer errors between plasma managers, abort unsealed objects

* Add TODO for local scheduler exiting on plasma manager death

* Revert "Plasma client test for plasma abort"

This reverts commit e00fbd58dc4a632f58383549b19fb9057b305a14.

* Upgrade arrow to version with PlasmaClient::Abort

* Fix plasma manager test

* Fix plasma test

* Temporarily use arrow fork for testing

* fix and set arrow commit

* Fix plasma test

* Fix plasma manager test and make write_object_chunk consistent with read_object_chunk

* style

* upgrade arrow
2017-11-15 22:32:38 -08:00
Peter Schafhalter
9a7b15447b Replace UT string in redis tests (#1211)
* Replace UT arg formatting with vsnprintf

* Fix bug with va_list usage
2017-11-15 22:21:56 -08:00
Peter Schafhalter
428858c1ff Convert UT string to std::string (#1210) 2017-11-12 21:00:36 -08:00
Peter Schafhalter
9a6a056609 Convert UT datastructures in tests (#1203)
* bind_ipc_sock_retry returns std::string

* snprintf -> std::snprintf

* Fix formatting

* Use stringstream instead of snprintf

* Fix typo
2017-11-11 16:55:05 -08:00
Philipp Moritz
e798a652bc Change TaskSpec to allow multiple object IDs per argument. (#1204)
* Implement object ID bags

* linting

* fix tests

* fix linting

* fix comments
2017-11-10 16:33:34 -08:00
Stephanie Wang
07f0532b9b Local scheduler filters out dead clients during reconstruction (#1182)
* Object table lookup returns vector of DBClientID instead of address strings

* Add node IP address to DBClient notification

* DB client cache stores entire DB client, convert addresses to std::string

* get cached db client returns the client

* Expose a call to initialize the redis cache

* Local scheduler filters out dead clients during reconstruction

* Remove node ip address from dbclient, use aux_address for plasma managers

* Get entire db client entry when not found in cache

* Fix common tests

* Fix address in tests

* Push error to driver if driver task did the put

* Address Robert's comments and cleanup

* Remove unused Redis command

* Fix db test
2017-11-10 11:29:24 -08:00
Robert Nishihara
d3c082d325 More checking in redis.cc. (#1057) 2017-11-08 23:25:19 -08:00
Robert Nishihara
1c6b30b5e2 Move all config constants into single file. (#1192)
* Initial pass at factoring out C++ configuration into a single file.

* Expose config through Python.

* Forward declarations.

* Fixes with Python extensions

* Remove old code.

* Consistent naming for constants.

* Fixes

* Fix linting.

* More linting.

* Whitespace

* rename config -> _config.

* Move config inside a class.

* update naming convention

* Fix linting.

* More linting

* More linting.

* Add in some more constants.

* Fix linting
2017-11-08 11:10:38 -08:00
Peter Schafhalter
a8032b9ca1 Convert connections from UT_array to std::vector (#1190) 2017-11-07 20:59:41 -08:00
Peter Schafhalter
7215f7d228 Remove UT String from logging (#1184)
* Removed unnecessary utarray include

* Removed ut_string from logging

* Fix formatting
2017-11-05 14:05:20 -08:00
Robert Nishihara
97c6369b49 Update arrow to include custom serializer for pytorch and register default serialization handlers. (#1152)
* Update arrow to include custom serializer for pytorch.

* Call pyarrow function for registering default custom serialization handlers.

* Change class ID used in serialization context for object IDs.
2017-10-21 21:24:10 -07:00
Philipp Moritz
684e62e784 upgrade arrow to include numpy bool fix (#1148) 2017-10-20 17:25:15 -07:00
Peter Schafhalter
ad4cbd4016 Updated outstanding_callbacks to unordered_map (#1108)
* Updated outstanding_callbacks to unordered_map

* Fix bug in destroy_outstanding_callbacks and comments
2017-10-20 10:06:22 -07:00
Stephanie Wang
af47737bd5 Prototype distributed actor handles (#1137)
* Add actor handle ID to the task spec

* Local scheduler dispatches actor tasks according to a task counter per handle

* Fix python test

* Allow passing actor handles into tasks. Not completely working yet. Also this is very messy.

* Fixes, should be roughly working now.

* Refactor actor handle wrapper

* Fix __init__ tests

* Terminate actor when the original handle goes out of scope

* TODO and a couple test cases

* Make tests for unsupported cases

* Fix Python mode tests

* Linting.

* Cache actor definitions that occur before ray.init() is called.

* Fix export actor class

* Deterministically compute actor handle ID

* Fix __getattribute__

* Fix string encoding for python3

* doc

* Add comment and assertion.
2017-10-19 23:49:59 -07:00
Robert Nishihara
1cdc2fb011 Clean up event loop and callbacks when processes exit. (#1125)
* Clean up event loop and callbacks when processes exit.

* Fix bug.
2017-10-19 17:07:03 -07:00
Philipp Moritz
4157bcb80b Improve deserialization performance by rebasing on latest arrow (#1129)
* improve serialization performance by rebasing on latest arrow

* update

* revert worker.py
2017-10-17 14:56:11 -07:00
Robert Nishihara
f3e3c7ec71 Add is_actor_checkpoint_method to TaskSpec. (#1117)
* Add is_actor_checkpoint_method to TaskSpec.

* Fix linting.

* Fix rebase error.

* Fix errors from rebase.
2017-10-15 16:52:10 -07:00
Robert Nishihara
d6062ef8f6 Compile with -rdynamic for better debugging symbols. (#1123)
* Compile with -rdynamic.

* Only use -rdynamic on Linux.

* Add comment.
2017-10-13 21:39:11 -07:00
Stephanie Wang
15486a14a0 Refactor actor task queues (#1118)
* Refactor add_task_to_actor_queue into queue_actor_task and insert_actor_task_queue

* Refactor actor task queue to share the waiting task queue

* Fix
2017-10-13 20:52:11 -07:00
Robert Nishihara
486cb64e3f Compile with -Werror and -Wall (#1116)
* Compile global scheduler with -Werror -Wall.

* Compile plasma manager with -Werror -Wall.

* Compile local scheduler with -Werror -Wall.

* Compile common code with -Werror -Wall.

* Signed/unsigned comparisons.

* More signed/unsigned fixes.

* More signed/unsigned fixes and added extern keyword.

* Fix linting.

* Don't check strict-aliasing because Python.h doesn't pass.
2017-10-12 21:00:23 -07:00
Stephanie Wang
3764f2f2e1 Actor checkpointing with object lineage reconstruction (#1004)
* Worker reports error in previous task, actor task counter is incremented after task is successful

* Refactor actor task execution

- Return new task counter in GetTaskRequest
- Update worker state for actor tasks inside of the actor method
  executor

* Manually invoked checkpoint method

* Scheduling for actor checkpoint methods

* Fix python bugs in checkpointing

* Return task success from worker to local scheduler instead of actor counter

* Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once

* Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks

* Make executor for temporary actor methods

* doc

* Set default argument for whether the previous task was a success

* Refactor actor method call

* Simplify checkpoint task submission

* lint

* fix philipp's comments

* Add missing line

* Make actor reconstruction tests run faster

* Unimportant whitespace.

* Unimportant whitespace.

* Update checkpoint method signature

* Documentation and handle exceptions during checkpoint save/resume

* Rename get_task message field to actor_checkpoint_failed

* Fix bug.

* Remove debugging check, redirect test output
2017-10-12 09:53:32 -07:00
Robert Nishihara
b585001881 When a task is passed to the global scheduler, if it is not received,… (#1106)
* When a task is passed to the global scheduler, if it is not received, then try again.
* Call give_task_to_global_scheduler directly (same with local).
2017-10-12 00:04:38 -07:00
Robert Nishihara
9f1e385335 Return errno from handle_sigpipe. (#1051) 2017-10-11 18:36:28 -07:00
Peter Schafhalter
46f6c163dc Converted ClientConnection to C++ standard library (#1099) 2017-10-11 11:12:15 -07:00
Stephanie Wang
1e0ab3d386 Switch to monotonic clock (#1100) 2017-10-10 22:35:21 -07:00
Philipp Moritz
0684258d2e Update arrow to include pandas serialization (#1102)
* update arrow to include pandas serialization

* update
2017-10-10 22:16:35 -07:00
Robert Nishihara
8f1a73f041 Allow Ray to be built without UI by setting INCLUDE_UI=0. (#1094)
* Allow building Ray without UI by setting INCLUDE_UI=0.

* Fix bash.

* Fix linting.
2017-10-09 23:32:38 -07:00
Stephanie Wang
aebe9f9374 Fix actor garbage collection by breaking cyclic references (#1064)
* Fix bug in wait_for_pid_to_exit, add test for actor deletion.

* Fix actor garbage collection by breaking cyclic references

* Add test for calling actor method immediately after actor creation.

* Fix bug, must dispatch tasks when workers are killed.

* Fix python test

* Fix cyclic reference problem by creating ActorMethod objects on the fly.

* Try simply increasing the time allowed for many_drivers_test.py.
2017-10-05 00:55:33 -07:00
Mitar
a0d3fb1de1 Fix Arrow's repository URL. (#1072)
Thanks!
2017-10-03 21:40:21 -07:00
Robert Nishihara
0dcf36c91e Switch Arrow commit. (#1068) 2017-10-03 13:56:53 -07:00
Philipp Moritz
57bd1d6ff5 Specialize Serialization for OrderedDict (#1035)
Specialize Serialization for OrderedDict and defaultdict
2017-10-02 17:33:10 -07:00
Robert Nishihara
1488975d1b Add timing statement to loop that calls redis_get_cached_db_client be… (#1045)
* Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past.

* Fix linting.

* Refactoring to make manager vectors into std::vector.

* Fix linting.

* Fixes.
2017-10-02 10:46:21 -07:00
Robert Nishihara
a31d138f21 Don't log when a worker can't be started. (#1056) 2017-10-02 10:32:46 -07:00
Philipp Moritz
79e013e876 upgrade to latest arrow to fix XCode 9 problem (#1042) 2017-09-30 16:24:59 -07:00
Robert Nishihara
ce278aa06a Fix valgrind tests. (#1037)
* Comment out local scheduler valgrind test.

* Fix free/delete error.

* More free -> delete errors

* One more free -> delete and also clean up callback state in plasma manager.

* Add set -x to run_valgrind scripts.

* Fix valgrind error in CreateLocalSchedulerInfoMessage.
2017-09-30 00:11:09 -07:00
Eric Liang
ba153adc4c Downgrade severity of most common messages (#1039)
* downgrade severity of most common messages

* update
2017-09-30 00:01:49 -07:00
Eric Liang
b118cef49e [webui] Allow timeline scroll-to-zoom without holding ALT (#993)
* Allow timeline scroll-to-zoom without holding ALT

* Update build_ui.sh

* Update build_ui.sh

* Update build_ui.sh

* Update build_ui.sh

* Retry when getting catapult.
2017-09-29 21:35:12 -07:00
Peter Schafhalter
10027974b1 Replaced ObjectWaitRequests with unordered map (#990)
* Replaced ObjectWaitRequests with unordered map

* Pass C++ STL object by reference

* Formatting changes and typos.
2017-09-28 15:29:26 -07:00
Zongheng Yang
427dee511b Fill out specs of the task table in ray_redis_module.cc. (#1024)
* Fill out specs of the task table in ray_redis_module.cc.

* local scheduler field in task table

* linting
2017-09-27 23:45:58 -07:00