Commit graph

165 commits

Author SHA1 Message Date
Robert Nishihara
6cd02d71f8 Fixes and cleanups for the multinode setting. (#143)
* Add function for driver to get address info from Redis.

* Use Redis address instead of Redis port.

* Configure Redis to run in unprotected mode.

* Add method for starting Ray processes on non-head node.

* Pass in correct node ip address to start_plasma_manager.

* Script for starting Ray processes.

* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.

* Have driver get info from Redis when start_ray_local=False.

* Fix.

* Script for killing ray processes.

* Catch some errors when the main_loop in a worker throws an exception.

* Allow redirecting stdout and stderr to /dev/null.

* Wrap start_ray.py in a shell script.

* More helpful error messages.

* Fixes.

* Wait for redis server to start up before configuring it.

* Allow seeding of deterministic object ID generation.

* Small change.
2016-12-21 18:53:12 -08:00
Robert Nishihara
c9c1b3e6af Change db_connect to allow different arguments from different processes. (#142)
* Allow db_connect to take a variable number of arguments.

* Fix tests.

* Fixes.

* Formatting.

* Fixes.

* Simplifications.

* Fix typo.
2016-12-20 20:21:35 -08:00
Philipp Moritz
0ca0864856 Use flatcc for serialization of IPC messages. (#140)
* added Phllipp's updates

* Switch to using flatbuffers for IPC.

* Various changes.

* convert remaining messages and cleanups

* fix

* fix function signatures

* fix valgrind errors

* clang-format

* final commit

* Fix valgrind test.
2016-12-20 14:46:25 -08:00
Stephanie Wang
d729f9b7ea Object table remove (#139)
* Object table remove redis module

* Test case for object table remove redis module

* Client code for object_table_remove

* Delete object notifications in plasma

* Test for object deletion notifications

* Fix subscribe deletion test

* Address Robert's comments

* free hash table entry
2016-12-19 23:18:57 -08:00
Alexey Tumanov
cb3e6cde9e passing object info information with redis module (#138)
* adding object broadcast channel; published on each object table add

* publishing data size to the bcast channel

* bug fix: objectkey

* update object tests to test for data size: C + py

* remove debug

* clang format

* Minor changes.

* Fix error.

* merging with Robert's comments

* clang format for the object table test upgrade
2016-12-19 21:07:25 -08:00
Robert Nishihara
269f37e26f Implement object table notification subscriptions and switch to using Redis modules for object table. (#134)
* Implement RAY.OBJECT_TABLE_REQUEST_NOTIFICATIONS.

* Call object_table_request_notifications from plasma manager.

* Use Redis modules for object table.

* Cleaning up code.

* More checks.

* Formatting.

* Make object table tests pass.

* Formatting.

* Add prefix to the object notification channel name.

* Formatting.

* Fixes.

* Increase time in redismodule test.
2016-12-18 18:19:02 -08:00
Robert Nishihara
58a873eb20 Deploy Redis module and start using custom Redis commands. (#128)
* Add RAY.CONNECT Redis command.

* Add RAY.GET_CLIENT_ADDRESS command.

* Build and clean Redis in common Makefile.

* Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands.

* Fixes.

* Remove mapping from redis client ID to ray db client ID.

* Fix.
2016-12-16 14:40:44 -08:00
Robert Nishihara
79dd1815a2 Python 3 compatibility. (#121)
* Make common module Python 3 compatible.

* Make plasma module Python 3 compatible.

* Make photon module Python 3 compatible.

* Make numbuf module Python 3 compatible.

* Remaining changes for Python 3 compatibility.

* Test Python 3 in Travis.

* Fixes.
2016-12-16 14:40:37 -08:00
Stephanie Wang
4bdb9f7224 Object reconstruction in Photon (#65)
* Object reconstruction in Photon and C test cases for Photon

* Fix hanging test case on mac

* Remove unnecessary event from photon tests

* make photon_disconnect not leak file descriptors

* fix some of the memory errors

* Fix valgrind

* lint

* Address Robert's comments and add test case for object reconstruction suppression

* Remove OWNER
2016-12-12 23:17:22 -08:00
Robert Nishihara
ddba1df802 Start working toward Python3 compatibility. (#117) 2016-12-11 12:25:31 -08:00
Robert Nishihara
9474d03912 Switch to updated Plasma API and consolidate wait and fetch implementations. (#116)
* Consolidate wait implementations.

* Consolidate fetch implementations.

* Share callback between wait and fetch to address issue in which only one callback can be run for a given subscribe channel.

* Reactivate manager tests.

* Remove more code.

* Add some documentation.
2016-12-10 21:22:05 -08:00
Robert Nishihara
86973059de Switch to new wait implementation. (#113)
* Duplicate wait1 implementation and seperate out wait datastructures.

* Address Philipp's comments.

* Temporarily address test failure problem by increasing timeout and reducing load in tests.

* Update stress tests to include distributed wait.
2016-12-09 19:26:11 -08:00
Robert Nishihara
46d0e6bdfb In tests, check that processes are still alive before killing them to catch crashes. (#110) 2016-12-09 13:04:08 -08:00
Robert Nishihara
5819e90962 Remove all object files when doing make clean. (#109) 2016-12-09 11:00:00 -08:00
Alexey Tumanov
0abbf5a113 End-to-end object size information passthrough (#105)
* rebase Alexey's PR on top

* rebase on master

* fix test failure waiting for plasma manager to exit

* clang format

* addressing comments

* Minor formatting and naming fixes.
2016-12-09 00:51:44 -08:00
Stephanie Wang
61904c4c3e Object hashes (#104)
* factoring out object_info for general use by several Ray components

* addressing comments

* Replace SHA256 task hash with MD5

Add object hash to object table (always overwrites)

Support for table operations that span multiple asynchronous Redis
commands

Add a new object location in a transaction, using Redis's optimistic
concurrency

Use Redis GETSET instead of transactions and Python frontend code for object hashing

Remove spurious log message

Fix for object_table_add

Revert "Replace SHA256 task hash with MD5"

This reverts commit e599de473c8dad9189ccb0600429534b469b76a2.

Revert to sha256

Test case for illegal puts

Use SETNX to set object hashes

Initialize digest with zeros

Initialize plasma_request with zeros

* Fixes

* replace SHA256 with a faster hash in the object store

* Fix valgrind

* Address Robert's comments

* Check that plasma_compute_object_hash succeeds.

* Don't run test_illegal_put test with valgrind because it causes an intentional crash which causes valgrind to complain.

* Debugging after rebase.

* handling Robert's comments

* Fix bugs after rebase.

* final fixes for Stephanie's PR

* fix
2016-12-08 20:57:08 -08:00
Robert Nishihara
4a62a3c5d7 Cause plasma client tests failures to show up on travis. (#99)
* Cause plasma client tests failures to show up on travis.

* Don't run flaky test.
2016-12-08 19:32:24 -08:00
atumanov
1c946b2f6a Factoring out object_info structure for use in several Ray components (#101)
* change plasma object notifications to carry a struct of information

* factoring out object_info for general use by several Ray components

* fixing a bug in python test

* addressing comments

* handling Robert's comments

* clang format

* Fix valgrind.
2016-12-08 19:14:10 -08:00
atumanov
88206417cb unifying plasma seal path through the store to mgr to redis (#96) 2016-12-07 17:25:40 -08:00
Aleks Kamko
b695f2b7e4 drop GIL for plasma_get (#95) 2016-12-07 17:01:14 -08:00
Robert Nishihara
b3c05655a0 Enable fetching objects from remote object stores. (#87)
* Fetch missing dependencies from local scheduler.

* Factor out global scheduler policy state.

* Use object_table_subscribe instead of object_table_lookup.

* Fix bug in which timer was being created twice for a single fetch request.

* Free old manager vector.
2016-12-06 15:47:31 -08:00
Philipp Moritz
03324caffc Put object store memory on /dev/shm on linux (#89)
* put object store memory on /dev/shm on linux

* fix

* fix mac
2016-12-06 00:31:47 -08:00
Robert Nishihara
dd39008532 Fix bug in which we were running out of file descriptors. (#91) 2016-12-05 23:45:49 -08:00
Philipp Moritz
ba53e4a43a Change object table subscribe to also return payload (#88)
* implement object table subscribe that also returns payload

* fix

* fix valgrind

* fix ray test

* fix clang-format

* fix

* fix
2016-12-05 00:26:53 -08:00
Robert Nishihara
2a3e9267f8 Non-blocking fetch implementation. (#83)
* Non-blocking fetch implementation.

* Make fetch tests more robust to timing issues.

* Bug fix when ignoring transferred objects.

* Fix.

* Documentation fixes.
2016-12-03 19:09:05 -08:00
Robert Nishihara
61755bc168 Replace direct calls to recv with read_bytes. (#78)
* Replace direct calls to recv with read_bytes.

* Fixes.

* Formatting.
2016-12-02 18:51:50 -08:00
Stephanie Wang
dadbcae17d Cache plasma object information on the plasma client (#73)
* Cache plasma object information on the plasma client

* Plasma client uses cached object references to get the object data
2016-12-01 16:46:17 -08:00
Ion
f89be9699c Introduce non-blocking Plasma API. (#71)
* Implement new plasma client API.

* Formatting fixes.

* Make tests work again.

* Make tests run.

* Comment style.

* Fix bugs with fetch tests.

* Introduce fetch1 flag.

* Remove timer only if present.

* Formatting fixes.

* Don't access object after free.

* Formatting fixes.

* Minor change.

* refactoring plasma datastructures

* Change plasma_request and plasma_reply to use only arrays of object requests.

* some more fixes

* Remove unnecessary methods.

* Trivial.

* fixes

* use plasma_send_reply in return_from_wait1

* Lint.
2016-12-01 02:15:21 -08:00
atumanov
1499834be1 handling partial/interrupted I/O (#69)
* read_bytes: handle EINTR + easier flow

* addressing whitespace + style comments

* handling partial i/o: write_bytes; use [read|write]_bytes in plasma send/receive
2016-11-29 23:04:13 -08:00
Robert Nishihara
2c639c05ef When searching for Python with cmake, try custom tricks before using default cmake behavior. (#67)
* Change CMakeLists.txt.

* remove dead code
2016-11-29 14:32:54 -08:00
Robert Nishihara
bc1d7db926 Re-enable fetch tests. (#63)
* Bring back fetch tests.

* Zero initialize success array in PyPlasma_fetch.

* Fix bug in fetch in case where the object ID doesn't have any managers in the object table.

* Temporarily disallow calling fetch with multiple copies of the same object ID.

* Fix.

* Factor out code for checking if list of object IDs are all distinct.

* Remove commented out code.

* Fix.
2016-11-24 16:27:17 -08:00
Philipp Moritz
8147b62964 move methods related to plasma requests and replies to plasma.c (#60) 2016-11-23 21:13:19 -08:00
mehrdadn
7237ec4124 Windows compatibility (#57)
* Add Python and Redis submodules, and remove old third-party modules

* Update VS projects (WARNING: references files that do not exist yet)

* Update code & add shims for APIs except AF_UNIX/{send,recv}msg()

* Minor style changes.
2016-11-22 17:04:24 -08:00
Robert Nishihara
a93c6b7596 Quick workaround for ray.wait bug. (#42) 2016-11-22 13:31:22 -08:00
mehrdadn
35ce5f8001 Enable more warnings and fix usleep() issue (#55)
* Enable more warnings and fix usleep() issue

* Periods after comments.
2016-11-22 00:29:27 -08:00
mehrdadn
3714984094 Remove Redis version from Linux scripts (#56)
* Remove Redis version from Linux scripts

* Add documentation.
2016-11-21 15:02:40 -08:00
Robert Nishihara
f267da6aa8 Improve case on Mac where cmake fails to find Python with the custom search path. (#49)
* On Mac OS X, if cmake fails to find Python with custom search path, default to using find_package.

* Fix formatting in CMakeLists.txt.

* Set CUSTOM_PYTHON_EXECUTABLE variable on mac if the custom search path fails.

* Require exact Python version match.

* Use cmake mode keyword for logging messages.

* Fix.

* Find python libraries in cmake by searching near the include directories.

* Update cmakelists for numbuf.
2016-11-19 19:00:42 -08:00
Robert Nishihara
c8c3983195 Use sizeof(field) instead of sizeof(type) and other fixes. (#47)
* Use sizeof(field) instead of sizeof(type) and other fixes.

* Fix formatting.

* Bug fix.

* Zero-initialize structs. There are many more instances of these that I haven't changed yet.

* Bug fix.

* Revert from atexit to signaling to fix valgrind tests.

* Address Philipp's comments.
2016-11-19 12:19:49 -08:00
Robert Nishihara
d77b685a90 Global scheduler skeleton (#45)
* Initial scheduler commit

* global scheduler

* add global scheduler

* Implement global scheduler skeleton.

* Formatting.

* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.

* Fail if there are no local schedulers when the global scheduler receives a task.

* Initialize uninitialized value and formatting fix.

* Generalize local scheduler table to db client table.

* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.

* Queue task specs in the local scheduler instead of tasks.

* Simple global scheduler tests, including valgrind.

* Factor out functions for starting processes.

* Fixes.
2016-11-18 19:57:51 -08:00
Stephanie Wang
7babe0d22f Logging level (#38)
* Set logging levels in Makefile using -DRAY_COMMON_LOG_LEVEL=level

* Lower level of some LOG_ERROR messages, log the name of the table operation on failure

* Address rest of Robert's comments

* Fix spurious log message
2016-11-15 20:33:29 -08:00
Philipp Moritz
986ed5c9e8 Plasma C extensions (#34)
* switch plasma from ctypes to python C API

* clang-format

* various fixes
2016-11-13 16:23:28 -08:00
Robert Nishihara
ad6a401740 Get rid of old git submodule init calls. (#41) 2016-11-13 14:55:28 -08:00
Philipp Moritz
b1083e92e9 fix bug in wait (#35) 2016-11-11 17:05:40 -08:00
Stephanie Wang
9d1e750e8f Merge task table and task log into a single table (#30)
* Merge task table and task log

* Fix test in db tests

* Address Robert's comments and some better error checking

* Add a LOG_FATAL that exits the program
2016-11-10 18:13:26 -08:00
Robert Nishihara
90f88af902 Fix bug in which worker import counters were treated incorrectly. (#28)
* Fix bug in which worker import counters were treated incorrectly.

* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.

* Only define reusable variables locally after ray.init has been called.

* Remove flaky reference counting tests. It's not clear that these tests make sense.

* Make numbuf pip install verbose.

* Export cached reusable variables before cached remote functions.

* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.

* Acquire locks using 'with' statements.

* Fix possible test failure.

* Try to start Redis multiple times with different random ports if the original attempt failed.

* Fix test in which we redefine a remote function.
2016-11-06 22:24:39 -08:00
Philipp Moritz
1147c4d34b Keep objects in cache between tasks (#29)
* fix caching behavior

* fixes
2016-11-06 17:31:14 -08:00
Robert Nishihara
efe8a295ea Add basic LRU eviction for the plasma store. (#26)
* Basic functionality for LRU eviction.

* Test eviction.

* Factor out eviction policy.

* Move delete_object into eviction policy.

* Replace array of released objects with an LRU cache (hash table + doubly linked list).

* Finish rebase on master.

* Move actual object deletion away from eviction policy and into plasma store.

* Small fixes.

* Fixes.

* Make remove_object_from_lru_cache always remove the object.

* Minor formatting and comments.

* Pass in allowed memory as argument to Plasma store.

* Small fix.
2016-11-05 21:34:11 -07:00
Philipp Moritz
90a2aa4bf7 Various performance improvements (#24)
* switch from array to linked list for photon queue

* performance optimizations

* fix tests

* various fixes
2016-11-04 00:41:20 -07:00
Robert Nishihara
1e2b3ceac9 Fix whitespace. (#27) 2016-11-03 17:36:55 -07:00
Ujval Misra
b370a1df57 Merge sealed_objects and open_objects into a single hashmap (#25)
* Merge sealed_objects and open_objects into a single hashmap

* Entry contains enum that determines whether it is open or closed

* Removed unused variable.

* Applied Robert's patch

* Fixed styling.
2016-11-03 17:33:54 -07:00