Commit graph

363 commits

Author SHA1 Message Date
Robert Nishihara
ba1ce85f58 Download Redis and flatbuffers differently. (#1602)
* Download Redis differently.

* Get flatbuffers with curl
2018-02-25 20:32:33 -08:00
Robert Nishihara
f4b1881fec Update arrow to use updated pandas serializer. (#1582) 2018-02-22 11:10:52 -08:00
Robert Nishihara
db4a920bdb Cleanup parquet installation. (#1549)
* Cleanup parquet installation.

* Fix

* Small changes.

* Add brew installs

* Modify paths for compilation of parquet.

* Remove LD_LIBRARY_PATH

* Don't set unnecessary environment variables on Linux.

* Set environment variables for make.

* Brew installs for macos wheels.

* Update

* Pass PARQUET_HOME when building pyarrow.

* Don't exit with error code.
2018-02-20 15:21:32 -08:00
Philipp Moritz
eabc4027c8 Hiredis asio integration (#1547) 2018-02-20 13:37:09 -08:00
Simon Mo
a24cc28773 [DataFrame] Add Parquet Support in Build Process (#1531)
* Add shell script for building parquet

* Use parquet ci script; remove anaconda

* Remove gcc flag, use default

* add boost_root

* Fix $TP_DIR reference issue

* fix the PR

* check out specific parquet-cpp commit
2018-02-16 07:18:42 -08:00
Alexey Tumanov
844a6afcdd Implement simple random spillback policy. (#1493)
* spillback policy implementation: global + local scheduler

* modernize global scheduler policy state; factor out random number engine and generator

* Minimal version.

* Fix test.

* Make load balancing test less strenuous.
2018-02-13 00:09:35 -08:00
Philipp Moritz
1ab2e63dbd Tune transfer buffer size (#1363)
Increase buffsize from `4096` to `80*1024`.
2018-02-09 14:56:36 -08:00
Robert Nishihara
89db7841d2 Update arrow version. (#1512) 2018-02-07 23:05:16 -08:00
Stephanie Wang
ff8e7f8259
Actor checkpointing for distributed actor handles (#1498)
* Expose calls to get and set the actor frontier

* Remove fields used for old checkpointing prototype, change actor_checkpoint_failed -> succeeded

* Prototype for actor checkpointing

* Filter out duplicate tasks on the local scheduler

* Clean up some of the Python checkpointing code

* More cleanups

* Documentation

* cleanup and fix unit test

* Allow remote checkpoint calls through actor handle

* Check whether object is local before reconstructing

* Enable checkpointing for distributed actor handles, refactor tests

* Fix local scheduler tests

* lint

* Address comments

* lint

* Skip tests that fail on new GCS

* style

* Don't put same object twice when setting the actor frontier

* Address Philipp's comments, cleaner fbs naming
2018-02-07 11:19:32 -08:00
Melih Elibol
d8850eac4b Suppress object transfer requests when object is already being received. (#1430)
* added deterministic check for objects received in fetch_timeout_handler.

* use receive time, in case something goes wrong after object is received.

* increase timeout for removal.

* indentation fix.

* make log info log debug. clean up debug log.

* undo unecessary changes.

* changed description var.

* shorten line 949.

* incorporate feedback.

* linting; make is_object_received function consts.

* change semantics of received_objects to objects being received.
added checks to both points at which objects are re-requested.
updated object receive initialization accordingly.

* eliminate erase on receive init. check call to request_transfer_from instead of request_transfer.

* updated comments.

* added todo for multiple object transfers.

* linting.
2018-02-01 22:45:31 -08:00
Philipp Moritz
a3f8fa426b Start integrating new GCS APIs (#1379)
* Start integrating new GCS calls

* fixes

* tests

* cleanup

* cleanup and valgrind fix

* update tests

* fix valgrind

* fix more valgrind

* fixes

* add separate tests for GCS

* fix linting

* update tests

* cleanup

* fix python linting

* more fixes

* fix linting

* add plasma manager callback

* add some documentation

* fix linting

* fix linting

* fixes

* update

* fix linting

* fix

* add spillback count

* fixes

* linting

* fixes

* fix linting

* fix

* fix

* fix
2018-01-31 11:01:12 -08:00
Robert Nishihara
3195c6aa63 Fix local scheduler crash when driver creates actor and exits. (#1474)
* Make check failures in redis.cc more informative.

* Fix bug by calling task_table_add_task.

* Add test.
2018-01-26 14:29:53 -08:00
Stephanie Wang
668737f383 Replace actor dummy objects with mock calls to the local scheduler (#1467)
* Replace putting the dummy object with a call to the local scheduler

* Mark dummy objects as locally available
2018-01-26 14:18:45 -08:00
Robert Nishihara
5acc98e629 Update arrow with better dataframe serialization and get rid of custo… (#1413)
* Update arrow with better dataframe serialization and get rid of custom dataframe serializers.

* Update plasma client API.

* Fix potential bug.

* Bug fix.

* Update arrow to use deduplicated file descriptors and mutable buffers.

* Fix tests.

* Update commit.

* Update commit.

* Update commit.

* Update commit.

* Update commit

* Update commit back to arrow codebase.'
2018-01-24 10:03:29 -08:00
Alexey Tumanov
f1303291b4 Ray scheduler spillback plumbing + mechanism (#1362)
* spillback mechanism and plumbing : adding spillback counter + timestamp

* linting fix

* documentation

* Fix argument name.
2018-01-23 20:18:12 -08:00
Melih Elibol
4b1c8be4fe Fix setting log-level to debug. (#1432) 2018-01-21 21:51:05 -08:00
Stephanie Wang
74718efa73
Nondeterministic reconstruction for actors (#1344)
* Add failing unit test for nondeterministic reconstruction

* Retry scheduling actor tasks if reassigned to local scheduler

* Update execution edges asynchronously upon dispatch for nondeterministic reconstruction

* Fix bug for updating checkpoint task execution dependencies

* Update comments for deterministic reconstruction

* cleanup

* Add (and skip) failing test case for nondeterministic reconstruction

* Suppress test output
2018-01-21 13:44:13 -08:00
Robert Nishihara
088f01496c Remove unused object info table code. (#1388) 2018-01-05 11:00:06 -08:00
Robert Nishihara
e970e24ea5 Update arrow, and pass memcopy_threads into put. (#1374) 2017-12-31 13:32:06 -08:00
Philipp Moritz
3d224c4edf Second Part of Internal API Refactor (#1326) 2017-12-26 16:22:04 -08:00
Melih Elibol
4a2d62e7ef fix thirdparty install bug. (#1354) 2017-12-20 23:08:53 -08:00
Philipp Moritz
3c4408cf51 Rebase Ray on Arrow 0.8 (#1323)
* rebase Ray on Arrow 0.8

* rebase on apache repo
2017-12-19 14:24:21 -08:00
Robert Nishihara
76b6b4a2d3 When killing worker, release resources before dispatching tasks. (#1327) 2017-12-15 18:12:03 -08:00
Stephanie Wang
12fdb3f53a Convert actor dummy objects to task execution edges. (#1281)
* Define execution dependencies flatbuffer and add to Redis commands

* Convert TaskSpec to TaskExecutionSpec

* Add execution dependencies to Python bindings

* Submitting actor tasks uses execution dependency API instead of dummy argument

* Fix dependency getters and some cleanup for fetching missing dependencies

* C++ convention

* Make TaskExecutionSpec a C++ class

* Convert local scheduler to use TaskExecutionSpec class

* Convert some pointers to references

* Finish conversion to TaskExecutionSpec class

* fix

* Fix

* Fix memory errors?

* Cast flatbuffers GetSize to size_t

* Fixes

* add more retries in global scheduler unit test

* fix linting and cast fbb.GetSize to size_t

* Style and doc

* Fix linting and simplify from_flatbuf.
2017-12-14 20:47:54 -08:00
Philipp Moritz
cac5f47600 First Part of Internal Ray API Refactor (#1173)
* add Ray status class

* add C++ util files

* add ID types

* more APIs

* build system integration

* add test infrastructure and implement some APIs

* add more tests

* fix bugs

* add task table tests

* update

* add toolchain file

* fix

* test

* link with pthread

* update

* fix

* more fixes

* fixes

* always vendor gtest and gflags

* linting

* fixes

* add constants file

* comments

* more fixes

* fix linting
2017-12-14 14:54:09 -08:00
Robert Nishihara
2f750e9ba7 Add parentheses around one-line if statement. (#1318) 2017-12-13 23:48:53 -08:00
Robert Nishihara
f75b51d178 Register Common.error with local scheduler extension module. (#1316)
* Register Common.error with local scheduler extension module.

* Add test.
2017-12-13 11:55:54 -08:00
Stephanie Wang
bac39a134e
Define a wrapper class for callback_data.data (#1301) 2017-12-08 11:48:21 -08:00
Stephanie Wang
044548bcff Mark the killed as done outside of loop (#1284) 2017-12-02 14:42:16 -08:00
Robert Nishihara
c21e189371 Allow scheduling with arbitrary user-defined resource labels. (#1236)
* Enable scheduling with custom resource labels.

* Fix.

* Minor fixes and ref counting fix.

* Linting

* Use .data() instead of .c_str().

* Fix linting.

* Fix ResourcesTest.testGPUIDs test by waiting for workers to start up.

* Sleep in test so that all tasks are submitted before any completes.
2017-12-01 11:41:40 -08:00
Robert Nishihara
e0a340ee7e Allow actors to pin at most 1000 dummy objects at a time. (#1241)
* Allow actors to pin at most 1000 dummy objects at a time.

* Fix linting.
2017-11-22 13:38:01 -08:00
Eric Liang
9233e496cc Raise exception when getting the task results of workers that died (#1224)
* wip

* with test

* add timeout

* also add test for f

* remove on cleanup

* update

* wip

* fix tests

* mark actor removed in redis

* clang-format

* fix bug when no-inprogress tasks

* try to set task status done

* Add comment.
2017-11-20 15:18:39 -08:00
Peter Schafhalter
e0360eb429 Remove UT libraries and clean up remaining UT datastructures (#1230)
* Remove UT string include from redis

* Remove UT string include from DB tests

* Modify TaskSpec_print to remove UT string

* Remove UT libraries
2017-11-19 15:01:33 -08:00
Peter Schafhalter
d986294c2b Replace UT strings in local scheduler (#1213)
* Convert to string using std::string

* Fix linting issue

* Fix linting

* Construct db_connect_args using vector

* Use vector size() instead of num_args

* Hopefully fix linting now
2017-11-17 16:14:46 -08:00
Robert Nishihara
94423c0542 Upgrade Arrow with fixes to Plasma eviction policy. (#1228)
* Upgrade Arrow with fixes to Plasma eviction policy.

* Upgrade arrow to have -f flag for plasma store.
2017-11-17 14:41:22 -08:00
Peter Schafhalter
4cbc2b1978 Clean up UT datastructures in Python extension (#1227) 2017-11-17 01:07:12 -08:00
Stephanie Wang
c70430f322 Fix bugs in plasma manager transfer (#1188)
* Plasma client test for plasma abort

* Use ray-project/arrow:abort-objects branch

* Set plasma manager connection cursor to -1 when not in use

* Handle transfer errors between plasma managers, abort unsealed objects

* Add TODO for local scheduler exiting on plasma manager death

* Revert "Plasma client test for plasma abort"

This reverts commit e00fbd58dc4a632f58383549b19fb9057b305a14.

* Upgrade arrow to version with PlasmaClient::Abort

* Fix plasma manager test

* Fix plasma test

* Temporarily use arrow fork for testing

* fix and set arrow commit

* Fix plasma test

* Fix plasma manager test and make write_object_chunk consistent with read_object_chunk

* style

* upgrade arrow
2017-11-15 22:32:38 -08:00
Peter Schafhalter
9a7b15447b Replace UT string in redis tests (#1211)
* Replace UT arg formatting with vsnprintf

* Fix bug with va_list usage
2017-11-15 22:21:56 -08:00
Peter Schafhalter
428858c1ff Convert UT string to std::string (#1210) 2017-11-12 21:00:36 -08:00
Peter Schafhalter
9a6a056609 Convert UT datastructures in tests (#1203)
* bind_ipc_sock_retry returns std::string

* snprintf -> std::snprintf

* Fix formatting

* Use stringstream instead of snprintf

* Fix typo
2017-11-11 16:55:05 -08:00
Philipp Moritz
e798a652bc Change TaskSpec to allow multiple object IDs per argument. (#1204)
* Implement object ID bags

* linting

* fix tests

* fix linting

* fix comments
2017-11-10 16:33:34 -08:00
Stephanie Wang
07f0532b9b Local scheduler filters out dead clients during reconstruction (#1182)
* Object table lookup returns vector of DBClientID instead of address strings

* Add node IP address to DBClient notification

* DB client cache stores entire DB client, convert addresses to std::string

* get cached db client returns the client

* Expose a call to initialize the redis cache

* Local scheduler filters out dead clients during reconstruction

* Remove node ip address from dbclient, use aux_address for plasma managers

* Get entire db client entry when not found in cache

* Fix common tests

* Fix address in tests

* Push error to driver if driver task did the put

* Address Robert's comments and cleanup

* Remove unused Redis command

* Fix db test
2017-11-10 11:29:24 -08:00
Robert Nishihara
d3c082d325 More checking in redis.cc. (#1057) 2017-11-08 23:25:19 -08:00
Robert Nishihara
1c6b30b5e2 Move all config constants into single file. (#1192)
* Initial pass at factoring out C++ configuration into a single file.

* Expose config through Python.

* Forward declarations.

* Fixes with Python extensions

* Remove old code.

* Consistent naming for constants.

* Fixes

* Fix linting.

* More linting.

* Whitespace

* rename config -> _config.

* Move config inside a class.

* update naming convention

* Fix linting.

* More linting

* More linting.

* Add in some more constants.

* Fix linting
2017-11-08 11:10:38 -08:00
Peter Schafhalter
a8032b9ca1 Convert connections from UT_array to std::vector (#1190) 2017-11-07 20:59:41 -08:00
Peter Schafhalter
7215f7d228 Remove UT String from logging (#1184)
* Removed unnecessary utarray include

* Removed ut_string from logging

* Fix formatting
2017-11-05 14:05:20 -08:00
Robert Nishihara
97c6369b49 Update arrow to include custom serializer for pytorch and register default serialization handlers. (#1152)
* Update arrow to include custom serializer for pytorch.

* Call pyarrow function for registering default custom serialization handlers.

* Change class ID used in serialization context for object IDs.
2017-10-21 21:24:10 -07:00
Philipp Moritz
684e62e784 upgrade arrow to include numpy bool fix (#1148) 2017-10-20 17:25:15 -07:00
Peter Schafhalter
ad4cbd4016 Updated outstanding_callbacks to unordered_map (#1108)
* Updated outstanding_callbacks to unordered_map

* Fix bug in destroy_outstanding_callbacks and comments
2017-10-20 10:06:22 -07:00
Stephanie Wang
af47737bd5 Prototype distributed actor handles (#1137)
* Add actor handle ID to the task spec

* Local scheduler dispatches actor tasks according to a task counter per handle

* Fix python test

* Allow passing actor handles into tasks. Not completely working yet. Also this is very messy.

* Fixes, should be roughly working now.

* Refactor actor handle wrapper

* Fix __init__ tests

* Terminate actor when the original handle goes out of scope

* TODO and a couple test cases

* Make tests for unsupported cases

* Fix Python mode tests

* Linting.

* Cache actor definitions that occur before ray.init() is called.

* Fix export actor class

* Deterministically compute actor handle ID

* Fix __getattribute__

* Fix string encoding for python3

* doc

* Add comment and assertion.
2017-10-19 23:49:59 -07:00