Commit graph

137 commits

Author SHA1 Message Date
Robert Nishihara
6cd02d71f8 Fixes and cleanups for the multinode setting. (#143)
* Add function for driver to get address info from Redis.

* Use Redis address instead of Redis port.

* Configure Redis to run in unprotected mode.

* Add method for starting Ray processes on non-head node.

* Pass in correct node ip address to start_plasma_manager.

* Script for starting Ray processes.

* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.

* Have driver get info from Redis when start_ray_local=False.

* Fix.

* Script for killing ray processes.

* Catch some errors when the main_loop in a worker throws an exception.

* Allow redirecting stdout and stderr to /dev/null.

* Wrap start_ray.py in a shell script.

* More helpful error messages.

* Fixes.

* Wait for redis server to start up before configuring it.

* Allow seeding of deterministic object ID generation.

* Small change.
2016-12-21 18:53:12 -08:00
Robert Nishihara
c9c1b3e6af Change db_connect to allow different arguments from different processes. (#142)
* Allow db_connect to take a variable number of arguments.

* Fix tests.

* Fixes.

* Formatting.

* Fixes.

* Simplifications.

* Fix typo.
2016-12-20 20:21:35 -08:00
Robert Nishihara
58a873eb20 Deploy Redis module and start using custom Redis commands. (#128)
* Add RAY.CONNECT Redis command.

* Add RAY.GET_CLIENT_ADDRESS command.

* Build and clean Redis in common Makefile.

* Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands.

* Fixes.

* Remove mapping from redis client ID to ray db client ID.

* Fix.
2016-12-16 14:40:44 -08:00
Robert Nishihara
1c95840765 Revert cloudpickle -> pickling change, which no longer seems necessary. (#127) 2016-12-16 14:40:44 -08:00
Robert Nishihara
79dd1815a2 Python 3 compatibility. (#121)
* Make common module Python 3 compatible.

* Make plasma module Python 3 compatible.

* Make photon module Python 3 compatible.

* Make numbuf module Python 3 compatible.

* Remaining changes for Python 3 compatibility.

* Test Python 3 in Travis.

* Fixes.
2016-12-16 14:40:37 -08:00
Alexey Tumanov
946242929f Plasma photon association: passing through plasma address with photon db connection (#123)
* passing plasma ip:port association with photon through redis to global scheduler

* Fix test.

* sanity-checking aux_address inside db_connect_extended

* clang format

* fix photon tests

* clang format photon tests
2016-12-13 17:21:38 -08:00
Philipp Moritz
2152cd9f31 Fix seed bug for generating object ids for put (#120)
* fix seed bug for generating object ids for put

* fix clang-format
2016-12-13 00:54:38 -08:00
Robert Nishihara
ddba1df802 Start working toward Python3 compatibility. (#117) 2016-12-11 12:25:31 -08:00
Robert Nishihara
0f7091099d Throw an exception if a Ray method is called from a thread that isn't the main thread. (#97) 2016-12-10 21:24:50 -08:00
Robert Nishihara
9474d03912 Switch to updated Plasma API and consolidate wait and fetch implementations. (#116)
* Consolidate wait implementations.

* Consolidate fetch implementations.

* Share callback between wait and fetch to address issue in which only one callback can be run for a given subscribe channel.

* Reactivate manager tests.

* Remove more code.

* Add some documentation.
2016-12-10 21:22:05 -08:00
Robert Nishihara
6441571d31 Introduce some stress tests. (#106)
* Retry first connection to redis in db_connect.

* Declare usleep.

* Formatting.

* Introduce some stress tests.
2016-12-09 17:49:31 -08:00
Robert Nishihara
b3c05655a0 Enable fetching objects from remote object stores. (#87)
* Fetch missing dependencies from local scheduler.

* Factor out global scheduler policy state.

* Use object_table_subscribe instead of object_table_lookup.

* Fix bug in which timer was being created twice for a single fetch request.

* Free old manager vector.
2016-12-06 15:47:31 -08:00
Philipp Moritz
03324caffc Put object store memory on /dev/shm on linux (#89)
* put object store memory on /dev/shm on linux

* fix

* fix mac
2016-12-06 00:31:47 -08:00
Robert Nishihara
b5ed2f063d Allow starting multiple local schedulers. (#86) 2016-12-04 17:08:16 -08:00
Philipp Moritz
58e8bbcb34 Fix bug in serializing arguments of tasks that are more complex objects (#72)
* Give more informative error message when we do not know how to serialize a class.

* Check that passing arguments to remote functions and getting them does not change their values.

* fix serialization bug

* fix tests for common module

* Formatting.

* Bug fix in init_pickle_module signature.

* Use pickle with HIGHEST_PROTOCOL.
2016-11-30 23:21:53 -08:00
Robert Nishihara
a93c6b7596 Quick workaround for ray.wait bug. (#42) 2016-11-22 13:31:22 -08:00
mehrdadn
3714984094 Remove Redis version from Linux scripts (#56)
* Remove Redis version from Linux scripts

* Add documentation.
2016-11-21 15:02:40 -08:00
Robert Nishihara
4b00c029ac Remove numbuf from requirements for setup.py. (#54)
* Remove numbuf from requirements for setup.py.

* Update documentation.
2016-11-21 14:30:17 -08:00
Robert Nishihara
d77b685a90 Global scheduler skeleton (#45)
* Initial scheduler commit

* global scheduler

* add global scheduler

* Implement global scheduler skeleton.

* Formatting.

* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.

* Fail if there are no local schedulers when the global scheduler receives a task.

* Initialize uninitialized value and formatting fix.

* Generalize local scheduler table to db client table.

* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.

* Queue task specs in the local scheduler instead of tasks.

* Simple global scheduler tests, including valgrind.

* Factor out functions for starting processes.

* Fixes.
2016-11-18 19:57:51 -08:00
Wapaul1
08707f9408 Integration of Webui with Ray (#32)
* Initial integration of webui with ray

* Re-organized calling of build-webui in setup.py

* Fixed Lint comments on js code

* Fixed more lint issues

* Fixed various issues

* Fixed directory in services.py

* Small changes.

* Changes to match lint
2016-11-17 22:33:29 -08:00
Philipp Moritz
986ed5c9e8 Plasma C extensions (#34)
* switch plasma from ctypes to python C API

* clang-format

* various fixes
2016-11-13 16:23:28 -08:00
Robert Nishihara
43aba4fc6d Install python package dependencies through setup.py. (#39)
* Install python package dependencies through setup.py.

* Do pip installs without sudo.

* Clearer instructions for running tests.

* Wording.
2016-11-12 19:37:20 -08:00
Robert Nishihara
5c89a7ab4e Fix bug in which gym environment could not be imported because of module naming conflicts. (#37) 2016-11-11 19:02:40 -08:00
Robert Nishihara
336a904404 Implement repr, hash, and richcompare for ObjectIDs. (#33)
* Implement repr, hash, and richcompare for ObjectIDs.

* Addressing comments.

* Partially fix example applications.
2016-11-11 09:18:36 -08:00
Robert Nishihara
194bdb1d96 Compute task IDs and object IDs deterministically. (#31)
* Put infrastructure in place to compute task IDs and object IDs.

* Fix version number for common library.

* Compute task IDs and object IDs deterministically.

* Address Stephanie's comments.

* Update task documentation.

* Fix formatting.

* Add more tests and checks.

* Fix formatting.

* Enable DCHECKs and change CHECKs to DCHECKs.
2016-11-08 14:46:34 -08:00
Robert Nishihara
90f88af902 Fix bug in which worker import counters were treated incorrectly. (#28)
* Fix bug in which worker import counters were treated incorrectly.

* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.

* Only define reusable variables locally after ray.init has been called.

* Remove flaky reference counting tests. It's not clear that these tests make sense.

* Make numbuf pip install verbose.

* Export cached reusable variables before cached remote functions.

* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.

* Acquire locks using 'with' statements.

* Fix possible test failure.

* Try to start Redis multiple times with different random ports if the original attempt failed.

* Fix test in which we redefine a remote function.
2016-11-06 22:24:39 -08:00
Robert Nishihara
efe8a295ea Add basic LRU eviction for the plasma store. (#26)
* Basic functionality for LRU eviction.

* Test eviction.

* Factor out eviction policy.

* Move delete_object into eviction policy.

* Replace array of released objects with an LRU cache (hash table + doubly linked list).

* Finish rebase on master.

* Move actual object deletion away from eviction policy and into plasma store.

* Small fixes.

* Fixes.

* Make remove_object_from_lru_cache always remove the object.

* Minor formatting and comments.

* Pass in allowed memory as argument to Plasma store.

* Small fix.
2016-11-05 21:34:11 -07:00
Philipp Moritz
90a2aa4bf7 Various performance improvements (#24)
* switch from array to linked list for photon queue

* performance optimizations

* fix tests

* various fixes
2016-11-04 00:41:20 -07:00
Robert Nishihara
681ec570ba Remove unnecessary pip installs. (#21)
* Small cleanups in worker.py.

* Remove dependencies on subprocess32, graphviz, protobuf, and ipython.

* Retry starting the plasma manager if the port is in use.

* Whitespace

* Move start_plasma_manager into plasma.py.
2016-11-02 16:40:37 -07:00
Robert Nishihara
072f442c1f Update worker.py and services.py to use plasma and the local scheduler. (#19)
* Update worker code and services code to use plasma and the local scheduler.

* Cleanups.

* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.

* Fix bug in install-dependencies.sh.

* Lengthen timeout in failure_test.py.

* Cleanups.

* Cleanup services.start_ray_local.

* Clean up random name generation.

* Cleanups.
2016-11-02 00:39:35 -07:00
Robert Nishihara
47851eeccc Build Ray with setup.py. (#14)
* Build Ray with setup.py.

* Building photon extensions with cmake.

* Fix formatting in photon_extension.c

* Pip install with sudo in Travis.

* Fix plasma __init__.py.

* Rename and remove some files.
2016-10-31 17:08:03 -07:00
Robert Nishihara
09a3ff7173 Pip install numbuf. (#8) 2016-10-28 14:30:20 -07:00
Robert Nishihara
0a44145906 Fix the resetting of reusable variables on the driver and cache functions to run on all workers. (#446)
* Properly reset reusable variables on the driver when remote functions are run locally on the driver.

* Cache functions to run on all workers that occur before ray.init is called.
2016-10-12 22:17:22 -07:00
Robert Nishihara
292656013a Suppress exceptions in the error logging thread when program exits. (#432) 2016-09-15 13:48:23 -07:00
Wapaul1
d5815673a5 Changed ray.select() to ray.wait() and its functionality (#426)
* Re-implemented select, changed name to wait

* Changed tests for select to tests for wait

* Updated the hyperopt example to match wait

* Small fixes and improve example readme.

* Make tests pass.
2016-09-14 17:14:11 -07:00
Robert Nishihara
ba56b08474 Reintroduce passing arguments by value to remote functions. (#425)
* Reintroduce passing arguments by value to remote functions.

* Check size of arguments passed by value.

* Fix computation graph visualization.
2016-09-10 21:11:18 -07:00
Robert Nishihara
5802cab87c Fix bug in which ObjectFixture gets called at exit after raylib gets set to None. (#416) 2016-09-07 18:49:19 -07:00
Robert Nishihara
d264713ceb Work around Arrow bug by increasing metadata size. (#415) 2016-09-07 18:46:04 -07:00
Robert Nishihara
11a8914684 Allow users to serialize custom classes. (#393)
* Allow serialization of custom classes.

* Add documentation and test cases, also fix pickle case.

* Don't allow old-style classes.
2016-09-06 13:28:24 -07:00
Robert Nishihara
d5cb3ac090 Propagate error messages from functions that run on all workers. (#410) 2016-09-06 10:06:43 -07:00
Robert Nishihara
327d7ff689 Fix bug to enable calling ray.get multiple times on same ObjectID. (#409) 2016-09-04 13:32:55 -07:00
Philipp Moritz
68cec55a98 Refcount without modifying objects (#407)
* refcount without modifying objects

* add documentation

* Update tests and documentation.

* Remove extraneous code.

* Update numbuf version.
2016-09-04 12:07:52 -07:00
Robert Nishihara
81f40774a7 Remove ObjectID aliasing from the API. (#406)
* Remove ObjectID aliasing from the API.

* Update documentation to remove aliasing.
2016-09-03 19:34:45 -07:00
Richard Shin
efb61ca9c7 Return scheduler address in ray.init (#403) 2016-09-03 18:25:47 -07:00
Richard Shin
80fdfcd1a5 Check num_objstores > 0 in start_ray_local (#402) 2016-09-03 17:37:06 -07:00
Philipp Moritz
3548797202 [API] Implement get for multiple objects (#398)
* [API] Implement get for multiple objects

* Small fixes.
2016-09-02 18:02:44 -07:00
Robert Nishihara
fb7ccef493 Allow remote decorator to be used with no parentheses. 2016-08-30 16:38:26 -07:00
Robert Nishihara
b87912cb2f Remove typing module. 2016-08-29 22:16:19 -07:00
Robert Nishihara
d7f313a026 Remove type information from remote decorator. 2016-08-29 22:05:59 -07:00
Wapaul1
420bcc0477 Remote function returning non-serializable type no longer shuts worker down (#384)
* Moved put_objects in main_loop to inside of try block

* Added test for failed serialization

* Fixed naming

* Minor
2016-08-25 15:26:22 -07:00