Commit graph

138 commits

Author SHA1 Message Date
Robert Nishihara
92010ca5b5 Check that we can connect to Redis and that there aren't existing redis clients on the same node in start_ray.py (#148) 2016-12-22 21:54:19 -08:00
Robert Nishihara
6cd02d71f8 Fixes and cleanups for the multinode setting. (#143)
* Add function for driver to get address info from Redis.

* Use Redis address instead of Redis port.

* Configure Redis to run in unprotected mode.

* Add method for starting Ray processes on non-head node.

* Pass in correct node ip address to start_plasma_manager.

* Script for starting Ray processes.

* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.

* Have driver get info from Redis when start_ray_local=False.

* Fix.

* Script for killing ray processes.

* Catch some errors when the main_loop in a worker throws an exception.

* Allow redirecting stdout and stderr to /dev/null.

* Wrap start_ray.py in a shell script.

* More helpful error messages.

* Fixes.

* Wait for redis server to start up before configuring it.

* Allow seeding of deterministic object ID generation.

* Small change.
2016-12-21 18:53:12 -08:00
Robert Nishihara
c9c1b3e6af Change db_connect to allow different arguments from different processes. (#142)
* Allow db_connect to take a variable number of arguments.

* Fix tests.

* Fixes.

* Formatting.

* Fixes.

* Simplifications.

* Fix typo.
2016-12-20 20:21:35 -08:00
Robert Nishihara
58a873eb20 Deploy Redis module and start using custom Redis commands. (#128)
* Add RAY.CONNECT Redis command.

* Add RAY.GET_CLIENT_ADDRESS command.

* Build and clean Redis in common Makefile.

* Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands.

* Fixes.

* Remove mapping from redis client ID to ray db client ID.

* Fix.
2016-12-16 14:40:44 -08:00
Robert Nishihara
1c95840765 Revert cloudpickle -> pickling change, which no longer seems necessary. (#127) 2016-12-16 14:40:44 -08:00
Robert Nishihara
79dd1815a2 Python 3 compatibility. (#121)
* Make common module Python 3 compatible.

* Make plasma module Python 3 compatible.

* Make photon module Python 3 compatible.

* Make numbuf module Python 3 compatible.

* Remaining changes for Python 3 compatibility.

* Test Python 3 in Travis.

* Fixes.
2016-12-16 14:40:37 -08:00
Alexey Tumanov
946242929f Plasma photon association: passing through plasma address with photon db connection (#123)
* passing plasma ip:port association with photon through redis to global scheduler

* Fix test.

* sanity-checking aux_address inside db_connect_extended

* clang format

* fix photon tests

* clang format photon tests
2016-12-13 17:21:38 -08:00
Philipp Moritz
2152cd9f31 Fix seed bug for generating object ids for put (#120)
* fix seed bug for generating object ids for put

* fix clang-format
2016-12-13 00:54:38 -08:00
Robert Nishihara
ddba1df802 Start working toward Python3 compatibility. (#117) 2016-12-11 12:25:31 -08:00
Robert Nishihara
0f7091099d Throw an exception if a Ray method is called from a thread that isn't the main thread. (#97) 2016-12-10 21:24:50 -08:00
Robert Nishihara
9474d03912 Switch to updated Plasma API and consolidate wait and fetch implementations. (#116)
* Consolidate wait implementations.

* Consolidate fetch implementations.

* Share callback between wait and fetch to address issue in which only one callback can be run for a given subscribe channel.

* Reactivate manager tests.

* Remove more code.

* Add some documentation.
2016-12-10 21:22:05 -08:00
Robert Nishihara
6441571d31 Introduce some stress tests. (#106)
* Retry first connection to redis in db_connect.

* Declare usleep.

* Formatting.

* Introduce some stress tests.
2016-12-09 17:49:31 -08:00
Robert Nishihara
b3c05655a0 Enable fetching objects from remote object stores. (#87)
* Fetch missing dependencies from local scheduler.

* Factor out global scheduler policy state.

* Use object_table_subscribe instead of object_table_lookup.

* Fix bug in which timer was being created twice for a single fetch request.

* Free old manager vector.
2016-12-06 15:47:31 -08:00
Philipp Moritz
03324caffc Put object store memory on /dev/shm on linux (#89)
* put object store memory on /dev/shm on linux

* fix

* fix mac
2016-12-06 00:31:47 -08:00
Robert Nishihara
b5ed2f063d Allow starting multiple local schedulers. (#86) 2016-12-04 17:08:16 -08:00
Philipp Moritz
58e8bbcb34 Fix bug in serializing arguments of tasks that are more complex objects (#72)
* Give more informative error message when we do not know how to serialize a class.

* Check that passing arguments to remote functions and getting them does not change their values.

* fix serialization bug

* fix tests for common module

* Formatting.

* Bug fix in init_pickle_module signature.

* Use pickle with HIGHEST_PROTOCOL.
2016-11-30 23:21:53 -08:00
Robert Nishihara
a93c6b7596 Quick workaround for ray.wait bug. (#42) 2016-11-22 13:31:22 -08:00
mehrdadn
3714984094 Remove Redis version from Linux scripts (#56)
* Remove Redis version from Linux scripts

* Add documentation.
2016-11-21 15:02:40 -08:00
Robert Nishihara
4b00c029ac Remove numbuf from requirements for setup.py. (#54)
* Remove numbuf from requirements for setup.py.

* Update documentation.
2016-11-21 14:30:17 -08:00
Robert Nishihara
d77b685a90 Global scheduler skeleton (#45)
* Initial scheduler commit

* global scheduler

* add global scheduler

* Implement global scheduler skeleton.

* Formatting.

* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.

* Fail if there are no local schedulers when the global scheduler receives a task.

* Initialize uninitialized value and formatting fix.

* Generalize local scheduler table to db client table.

* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.

* Queue task specs in the local scheduler instead of tasks.

* Simple global scheduler tests, including valgrind.

* Factor out functions for starting processes.

* Fixes.
2016-11-18 19:57:51 -08:00
Wapaul1
08707f9408 Integration of Webui with Ray (#32)
* Initial integration of webui with ray

* Re-organized calling of build-webui in setup.py

* Fixed Lint comments on js code

* Fixed more lint issues

* Fixed various issues

* Fixed directory in services.py

* Small changes.

* Changes to match lint
2016-11-17 22:33:29 -08:00
Philipp Moritz
986ed5c9e8 Plasma C extensions (#34)
* switch plasma from ctypes to python C API

* clang-format

* various fixes
2016-11-13 16:23:28 -08:00
Robert Nishihara
43aba4fc6d Install python package dependencies through setup.py. (#39)
* Install python package dependencies through setup.py.

* Do pip installs without sudo.

* Clearer instructions for running tests.

* Wording.
2016-11-12 19:37:20 -08:00
Robert Nishihara
5c89a7ab4e Fix bug in which gym environment could not be imported because of module naming conflicts. (#37) 2016-11-11 19:02:40 -08:00
Robert Nishihara
336a904404 Implement repr, hash, and richcompare for ObjectIDs. (#33)
* Implement repr, hash, and richcompare for ObjectIDs.

* Addressing comments.

* Partially fix example applications.
2016-11-11 09:18:36 -08:00
Robert Nishihara
194bdb1d96 Compute task IDs and object IDs deterministically. (#31)
* Put infrastructure in place to compute task IDs and object IDs.

* Fix version number for common library.

* Compute task IDs and object IDs deterministically.

* Address Stephanie's comments.

* Update task documentation.

* Fix formatting.

* Add more tests and checks.

* Fix formatting.

* Enable DCHECKs and change CHECKs to DCHECKs.
2016-11-08 14:46:34 -08:00
Robert Nishihara
90f88af902 Fix bug in which worker import counters were treated incorrectly. (#28)
* Fix bug in which worker import counters were treated incorrectly.

* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.

* Only define reusable variables locally after ray.init has been called.

* Remove flaky reference counting tests. It's not clear that these tests make sense.

* Make numbuf pip install verbose.

* Export cached reusable variables before cached remote functions.

* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.

* Acquire locks using 'with' statements.

* Fix possible test failure.

* Try to start Redis multiple times with different random ports if the original attempt failed.

* Fix test in which we redefine a remote function.
2016-11-06 22:24:39 -08:00
Robert Nishihara
efe8a295ea Add basic LRU eviction for the plasma store. (#26)
* Basic functionality for LRU eviction.

* Test eviction.

* Factor out eviction policy.

* Move delete_object into eviction policy.

* Replace array of released objects with an LRU cache (hash table + doubly linked list).

* Finish rebase on master.

* Move actual object deletion away from eviction policy and into plasma store.

* Small fixes.

* Fixes.

* Make remove_object_from_lru_cache always remove the object.

* Minor formatting and comments.

* Pass in allowed memory as argument to Plasma store.

* Small fix.
2016-11-05 21:34:11 -07:00
Philipp Moritz
90a2aa4bf7 Various performance improvements (#24)
* switch from array to linked list for photon queue

* performance optimizations

* fix tests

* various fixes
2016-11-04 00:41:20 -07:00
Robert Nishihara
681ec570ba Remove unnecessary pip installs. (#21)
* Small cleanups in worker.py.

* Remove dependencies on subprocess32, graphviz, protobuf, and ipython.

* Retry starting the plasma manager if the port is in use.

* Whitespace

* Move start_plasma_manager into plasma.py.
2016-11-02 16:40:37 -07:00
Robert Nishihara
072f442c1f Update worker.py and services.py to use plasma and the local scheduler. (#19)
* Update worker code and services code to use plasma and the local scheduler.

* Cleanups.

* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.

* Fix bug in install-dependencies.sh.

* Lengthen timeout in failure_test.py.

* Cleanups.

* Cleanup services.start_ray_local.

* Clean up random name generation.

* Cleanups.
2016-11-02 00:39:35 -07:00
Robert Nishihara
47851eeccc Build Ray with setup.py. (#14)
* Build Ray with setup.py.

* Building photon extensions with cmake.

* Fix formatting in photon_extension.c

* Pip install with sudo in Travis.

* Fix plasma __init__.py.

* Rename and remove some files.
2016-10-31 17:08:03 -07:00
Robert Nishihara
09a3ff7173 Pip install numbuf. (#8) 2016-10-28 14:30:20 -07:00
Robert Nishihara
0a44145906 Fix the resetting of reusable variables on the driver and cache functions to run on all workers. (#446)
* Properly reset reusable variables on the driver when remote functions are run locally on the driver.

* Cache functions to run on all workers that occur before ray.init is called.
2016-10-12 22:17:22 -07:00
Robert Nishihara
292656013a Suppress exceptions in the error logging thread when program exits. (#432) 2016-09-15 13:48:23 -07:00
Wapaul1
d5815673a5 Changed ray.select() to ray.wait() and its functionality (#426)
* Re-implemented select, changed name to wait

* Changed tests for select to tests for wait

* Updated the hyperopt example to match wait

* Small fixes and improve example readme.

* Make tests pass.
2016-09-14 17:14:11 -07:00
Robert Nishihara
ba56b08474 Reintroduce passing arguments by value to remote functions. (#425)
* Reintroduce passing arguments by value to remote functions.

* Check size of arguments passed by value.

* Fix computation graph visualization.
2016-09-10 21:11:18 -07:00
Robert Nishihara
5802cab87c Fix bug in which ObjectFixture gets called at exit after raylib gets set to None. (#416) 2016-09-07 18:49:19 -07:00
Robert Nishihara
d264713ceb Work around Arrow bug by increasing metadata size. (#415) 2016-09-07 18:46:04 -07:00
Robert Nishihara
11a8914684 Allow users to serialize custom classes. (#393)
* Allow serialization of custom classes.

* Add documentation and test cases, also fix pickle case.

* Don't allow old-style classes.
2016-09-06 13:28:24 -07:00
Robert Nishihara
d5cb3ac090 Propagate error messages from functions that run on all workers. (#410) 2016-09-06 10:06:43 -07:00
Robert Nishihara
327d7ff689 Fix bug to enable calling ray.get multiple times on same ObjectID. (#409) 2016-09-04 13:32:55 -07:00
Philipp Moritz
68cec55a98 Refcount without modifying objects (#407)
* refcount without modifying objects

* add documentation

* Update tests and documentation.

* Remove extraneous code.

* Update numbuf version.
2016-09-04 12:07:52 -07:00
Robert Nishihara
81f40774a7 Remove ObjectID aliasing from the API. (#406)
* Remove ObjectID aliasing from the API.

* Update documentation to remove aliasing.
2016-09-03 19:34:45 -07:00
Richard Shin
efb61ca9c7 Return scheduler address in ray.init (#403) 2016-09-03 18:25:47 -07:00
Richard Shin
80fdfcd1a5 Check num_objstores > 0 in start_ray_local (#402) 2016-09-03 17:37:06 -07:00
Philipp Moritz
3548797202 [API] Implement get for multiple objects (#398)
* [API] Implement get for multiple objects

* Small fixes.
2016-09-02 18:02:44 -07:00
Robert Nishihara
fb7ccef493 Allow remote decorator to be used with no parentheses. 2016-08-30 16:38:26 -07:00
Robert Nishihara
b87912cb2f Remove typing module. 2016-08-29 22:16:19 -07:00
Robert Nishihara
d7f313a026 Remove type information from remote decorator. 2016-08-29 22:05:59 -07:00