Commit graph

712 commits

Author SHA1 Message Date
Robert Nishihara
d1594860de Remove javascript dependencies. (#169) 2016-12-30 23:16:17 -08:00
Robert Nishihara
603a7e3dd3 Add documentation for troubleshooting installation. (#167) 2016-12-30 23:15:25 -08:00
Wapaul1
e00b27b14e Removed webui code from setup.py and services.py (#168) 2016-12-30 21:45:58 -08:00
Robert Nishihara
84296c8905 Documentation for using Ray on a cluster. (#165) 2016-12-30 00:29:03 -08:00
Robert Nishihara
13ee0ef366 Only download arrow if not already present. (#166) 2016-12-30 00:25:46 -08:00
Stephanie Wang
6828d694ae Test object notifications from Plasma store (#141)
* Object notification test for Photon, and turn on valgrind for Photon C tests

* Test object notification handler in the plasma manager

* Fix hanging test case
2016-12-29 23:10:38 -08:00
Robert Nishihara
f9f667de47 Improve formatting of error messages. (#154)
* Improve formatting of error messages.

* Catch errors that occur when looking up function name from function ID.

* Push warning to user if worker spends to long waiting for proper import counter.

* Fixes.

* Add comment.
2016-12-29 00:11:13 -08:00
Robert Nishihara
acf1703afd Implement naive scheduling algorithm using local scheduler load. (#164)
* Implement naive scheduling algorithm using local scheduler load.

* Have the global scheduler estimate load on local schedulers better.

* Fixes.
2016-12-28 22:33:20 -08:00
Robert Nishihara
a1a08b9ad4 Cause pip installation of numbuf to fail if the build.sh or setup.sh fail. (#163) 2016-12-28 16:54:14 -08:00
Stephanie Wang
c403ab11ab Allow ray.init to take in address information about existing services. (#161)
* Refactor ray.init and ray.services to allow processes that are already running

* Fix indexing error

* Address Robert's comments
2016-12-28 14:17:29 -08:00
Robert Nishihara
baf835efcd Throw Python exception if plasma store cannot create new object. (#162)
* Propagate error messages through plasma create.

* Use custom exception types instead of exception messages.
2016-12-28 11:56:16 -08:00
Robert Nishihara
10e067e5e5 Delay releasing a maximum number of bytes in the plasma client. (#160)
* Send message from plasma client to get plasma store capacity.

* Release objects from plasma client if they are too large.

* Use doubly-linked list instead of ring buffer for plasma client release history.

* Address comments.

* Fix problem with slicing PlasmaBuffer objects.

* Fix crash in plasma manager during transfer.

* Formatting.

* Make plasma client cache larger and make caching test not throw exceptions on Travis.
2016-12-27 19:51:26 -08:00
Robert Nishihara
26941e02aa Attempt to free up to 20% of the plasma store capacity during eviction. (#159) 2016-12-27 12:12:33 -08:00
Robert Nishihara
985c424172 Use redismodules for task table and result table. (#156)
* Switch to using redis modules for task table.

* Switch to using redis modules for the task table.

* Fix some tests.

* Fix naming and remove code duplication.

* Remove duplication in redis modules and add more cleanups.

* Address comments.
2016-12-25 23:57:05 -08:00
Philipp Moritz
d6695c867a fix wait test (#158) 2016-12-25 23:43:01 -08:00
Philipp Moritz
8309e3f355 Redis string formatting (#157)
* redis string formatting

* fixes

* add documentation

* fixes
2016-12-25 22:43:07 -08:00
Robert Nishihara
3d697c7ed2 Introduce local scheduler heartbeats which carry load information. (#155)
* Introduce local scheduler heartbeats which carry load information.
2016-12-24 20:02:25 -08:00
Robert Nishihara
9bb9f8cb54 Fix bug in ray.wait. (#153)
* Fix bug in wait implementation.

* Add test that exposes previous bug.
2016-12-23 16:22:41 -08:00
Robert Nishihara
241c955707 Determine node IP address programatically. (#151)
* Determine node ip address programatically.

* Factor out methods for getting node IP addresses.

* Address comments.
2016-12-23 15:31:40 -08:00
Robert Nishihara
8d90c9f432 Experimental utils for copying directories to other machines in the c… (#150)
* Experimental utils for copying directories to other machines in the cluster using Ray.

* Test copying directory functionality.

* Small fix.
2016-12-23 00:43:16 -08:00
Robert Nishihara
86b211f5c2 Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149)
* Suppress Redis warnings and remove some global scheduler logging.

* Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.
2016-12-22 22:05:58 -08:00
Robert Nishihara
92010ca5b5 Check that we can connect to Redis and that there aren't existing redis clients on the same node in start_ray.py (#148) 2016-12-22 21:54:19 -08:00
Alexey Tumanov
46a887039e Global scheduler - per-task transfer-aware policy (#145)
* global scheduler with object transfer cost awareness -- upstream rebase

* debugging global scheduler: multiple subscriptions

* global scheduler: utarray push bug fix; tasks change state to SCHEDULED

* change global scheduler test to be an integraton test

* unit and integration tests are passing for global scheduler

* improve global scheduler test: break up into several

* global scheduler checkpoint: fix photon object id bug in test

* test with timesync between object and task notifications; TODO: handle OoO object+task notifications in GS

* fallback to base policy if no object dependencies are cached (may happen due to OoO object+task notification arrivals

* clean up printfs; handle a missing LS in LS cache

* Minor changes to Python test and factor out some common code.

* refactoring handle task waiting

* addressing comments

* log_info -> log_debug

* Change object ID printing.

* PRId64 merge

* Python 3 fix.

* PRId64.

* Python 3 fix.

* resurrect differentiation between no args and missing object info; spacing

* Valgrind fix.

* Run all global scheduler tests in valgrind.

* clang format

* Comments and documentation changes.

* Minor cleanups.

* fix whitespace

* Fix.

* Documentation fix.
2016-12-22 03:11:46 -08:00
Robert Nishihara
6cd02d71f8 Fixes and cleanups for the multinode setting. (#143)
* Add function for driver to get address info from Redis.

* Use Redis address instead of Redis port.

* Configure Redis to run in unprotected mode.

* Add method for starting Ray processes on non-head node.

* Pass in correct node ip address to start_plasma_manager.

* Script for starting Ray processes.

* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.

* Have driver get info from Redis when start_ray_local=False.

* Fix.

* Script for killing ray processes.

* Catch some errors when the main_loop in a worker throws an exception.

* Allow redirecting stdout and stderr to /dev/null.

* Wrap start_ray.py in a shell script.

* More helpful error messages.

* Fixes.

* Wait for redis server to start up before configuring it.

* Allow seeding of deterministic object ID generation.

* Small change.
2016-12-21 18:53:12 -08:00
Robert Nishihara
c9c1b3e6af Change db_connect to allow different arguments from different processes. (#142)
* Allow db_connect to take a variable number of arguments.

* Fix tests.

* Fixes.

* Formatting.

* Fixes.

* Simplifications.

* Fix typo.
2016-12-20 20:21:35 -08:00
Philipp Moritz
0ca0864856 Use flatcc for serialization of IPC messages. (#140)
* added Phllipp's updates

* Switch to using flatbuffers for IPC.

* Various changes.

* convert remaining messages and cleanups

* fix

* fix function signatures

* fix valgrind errors

* clang-format

* final commit

* Fix valgrind test.
2016-12-20 14:46:25 -08:00
Stephanie Wang
6a73711888 Update the task table (#129)
* Update the task table

* Move updating task table out of scheduling algorithm.
2016-12-20 00:13:39 -08:00
Stephanie Wang
d729f9b7ea Object table remove (#139)
* Object table remove redis module

* Test case for object table remove redis module

* Client code for object_table_remove

* Delete object notifications in plasma

* Test for object deletion notifications

* Fix subscribe deletion test

* Address Robert's comments

* free hash table entry
2016-12-19 23:18:57 -08:00
Alexey Tumanov
cb3e6cde9e passing object info information with redis module (#138)
* adding object broadcast channel; published on each object table add

* publishing data size to the bcast channel

* bug fix: objectkey

* update object tests to test for data size: C + py

* remove debug

* clang format

* Minor changes.

* Fix error.

* merging with Robert's comments

* clang format for the object table test upgrade
2016-12-19 21:07:25 -08:00
Robert Nishihara
269f37e26f Implement object table notification subscriptions and switch to using Redis modules for object table. (#134)
* Implement RAY.OBJECT_TABLE_REQUEST_NOTIFICATIONS.

* Call object_table_request_notifications from plasma manager.

* Use Redis modules for object table.

* Cleaning up code.

* More checks.

* Formatting.

* Make object table tests pass.

* Formatting.

* Add prefix to the object notification channel name.

* Formatting.

* Fixes.

* Increase time in redismodule test.
2016-12-18 18:19:02 -08:00
Robert Nishihara
c89bf4e5bc Fix improper handling of NULL characters when opening Redis keys. (#136)
* Fix improper handling of NULL characters when opening Redis keys.

* Add test.
2016-12-18 13:06:28 -08:00
Robert Nishihara
a9e6a53360 Print helpful error message when libgcc error occurs. (#133) 2016-12-17 15:38:48 -08:00
Robert Nishihara
edf8d1ee9f Fix Python3 error in tests. (#135) 2016-12-17 12:42:37 -08:00
Stephanie Wang
e23661c375 Task table Redis module (#125)
* Task table redis module implementation

* Publish tasks and take in individual fields as args, not task object

* Scheduling state integer has width 1, error on illegal put

* Unit tests for task table and more documentation

* Task table subscribe, fix publish topics and address Philipp and Alexey's comments

* Helper function to create prefixed strings

* Factor out the table prefixes in the test cases
2016-12-16 14:40:44 -08:00
Robert Nishihara
58a873eb20 Deploy Redis module and start using custom Redis commands. (#128)
* Add RAY.CONNECT Redis command.

* Add RAY.GET_CLIENT_ADDRESS command.

* Build and clean Redis in common Makefile.

* Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands.

* Fixes.

* Remove mapping from redis client ID to ray db client ID.

* Fix.
2016-12-16 14:40:44 -08:00
Robert Nishihara
1c95840765 Revert cloudpickle -> pickling change, which no longer seems necessary. (#127) 2016-12-16 14:40:44 -08:00
Stephanie Wang
b0ba54e4c0 Fix psubscribe bug in object_table_subscribe (#126)
* Fix psubscribe

* Add TODO about subscription callbacks
2016-12-16 14:40:44 -08:00
Robert Nishihara
79dd1815a2 Python 3 compatibility. (#121)
* Make common module Python 3 compatible.

* Make plasma module Python 3 compatible.

* Make photon module Python 3 compatible.

* Make numbuf module Python 3 compatible.

* Remaining changes for Python 3 compatibility.

* Test Python 3 in Travis.

* Fixes.
2016-12-16 14:40:37 -08:00
Alexey Tumanov
946242929f Plasma photon association: passing through plasma address with photon db connection (#123)
* passing plasma ip:port association with photon through redis to global scheduler

* Fix test.

* sanity-checking aux_address inside db_connect_extended

* clang format

* fix photon tests

* clang format photon tests
2016-12-13 17:21:38 -08:00
Robert Nishihara
bce7e0fc07 Add include for usleep. (#124) 2016-12-13 14:24:59 -08:00
Philipp Moritz
2152cd9f31 Fix seed bug for generating object ids for put (#120)
* fix seed bug for generating object ids for put

* fix clang-format
2016-12-13 00:54:38 -08:00
Stephanie Wang
24d2b42d86 Fix object table subscriptions (#122)
* First attempt at fixing psubscribe. psubscribe_success_test will fail

* psubscribe test

* SUBSCRIBE returns the number of subscriptions, not success

* Comment out failing test.
2016-12-13 00:47:21 -08:00
Stephanie Wang
4bdb9f7224 Object reconstruction in Photon (#65)
* Object reconstruction in Photon and C test cases for Photon

* Fix hanging test case on mac

* Remove unnecessary event from photon tests

* make photon_disconnect not leak file descriptors

* fix some of the memory errors

* Fix valgrind

* lint

* Address Robert's comments and add test case for object reconstruction suppression

* Remove OWNER
2016-12-12 23:17:22 -08:00
Philipp Moritz
817f1e730c Implement tables with redis modules (#114)
* initial redis module

* temp commit

* temp commit

* temp commit

* Empty object table functions and broken object_table_lookup

* fix segfault and clean up code

* cleanup and tests

* try to ignore redismodule.h

* check if data_size is integer

* Minor changes to redis-module tests.

* try to exclude redismodule from clang-format

* try something different

* fix clang-format and tests

* sleep a bit

* Result table

* fix redis_module tests

* fix tests and add tests for result table

* more tests

* randomize ports

* Minor changes.

* More fixes.
2016-12-11 17:40:19 -08:00
Philipp Moritz
311e2be7dc clean common when cleaning photon (#118) 2016-12-11 17:30:52 -08:00
Robert Nishihara
ddba1df802 Start working toward Python3 compatibility. (#117) 2016-12-11 12:25:31 -08:00
Robert Nishihara
3d083c8b58 Build arrow without tests. (#85) 2016-12-10 21:25:52 -08:00
Robert Nishihara
0f7091099d Throw an exception if a Ray method is called from a thread that isn't the main thread. (#97) 2016-12-10 21:24:50 -08:00
Robert Nishihara
9474d03912 Switch to updated Plasma API and consolidate wait and fetch implementations. (#116)
* Consolidate wait implementations.

* Consolidate fetch implementations.

* Share callback between wait and fetch to address issue in which only one callback can be run for a given subscribe channel.

* Reactivate manager tests.

* Remove more code.

* Add some documentation.
2016-12-10 21:22:05 -08:00
Robert Nishihara
86973059de Switch to new wait implementation. (#113)
* Duplicate wait1 implementation and seperate out wait datastructures.

* Address Philipp's comments.

* Temporarily address test failure problem by increasing timeout and reducing load in tests.

* Update stress tests to include distributed wait.
2016-12-09 19:26:11 -08:00