Commit graph

135 commits

Author SHA1 Message Date
Robert Nishihara
79dd1815a2 Python 3 compatibility. (#121)
* Make common module Python 3 compatible.

* Make plasma module Python 3 compatible.

* Make photon module Python 3 compatible.

* Make numbuf module Python 3 compatible.

* Remaining changes for Python 3 compatibility.

* Test Python 3 in Travis.

* Fixes.
2016-12-16 14:40:37 -08:00
Robert Nishihara
ddba1df802 Start working toward Python3 compatibility. (#117) 2016-12-11 12:25:31 -08:00
Robert Nishihara
86973059de Switch to new wait implementation. (#113)
* Duplicate wait1 implementation and seperate out wait datastructures.

* Address Philipp's comments.

* Temporarily address test failure problem by increasing timeout and reducing load in tests.

* Update stress tests to include distributed wait.
2016-12-09 19:26:11 -08:00
Robert Nishihara
6441571d31 Introduce some stress tests. (#106)
* Retry first connection to redis in db_connect.

* Declare usleep.

* Formatting.

* Introduce some stress tests.
2016-12-09 17:49:31 -08:00
Robert Nishihara
b3c05655a0 Enable fetching objects from remote object stores. (#87)
* Fetch missing dependencies from local scheduler.

* Factor out global scheduler policy state.

* Use object_table_subscribe instead of object_table_lookup.

* Fix bug in which timer was being created twice for a single fetch request.

* Free old manager vector.
2016-12-06 15:47:31 -08:00
Philipp Moritz
58e8bbcb34 Fix bug in serializing arguments of tasks that are more complex objects (#72)
* Give more informative error message when we do not know how to serialize a class.

* Check that passing arguments to remote functions and getting them does not change their values.

* fix serialization bug

* fix tests for common module

* Formatting.

* Bug fix in init_pickle_module signature.

* Use pickle with HIGHEST_PROTOCOL.
2016-11-30 23:21:53 -08:00
Robert Nishihara
d77b685a90 Global scheduler skeleton (#45)
* Initial scheduler commit

* global scheduler

* add global scheduler

* Implement global scheduler skeleton.

* Formatting.

* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.

* Fail if there are no local schedulers when the global scheduler receives a task.

* Initialize uninitialized value and formatting fix.

* Generalize local scheduler table to db client table.

* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.

* Queue task specs in the local scheduler instead of tasks.

* Simple global scheduler tests, including valgrind.

* Factor out functions for starting processes.

* Fixes.
2016-11-18 19:57:51 -08:00
Robert Nishihara
336a904404 Implement repr, hash, and richcompare for ObjectIDs. (#33)
* Implement repr, hash, and richcompare for ObjectIDs.

* Addressing comments.

* Partially fix example applications.
2016-11-11 09:18:36 -08:00
Robert Nishihara
90f88af902 Fix bug in which worker import counters were treated incorrectly. (#28)
* Fix bug in which worker import counters were treated incorrectly.

* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.

* Only define reusable variables locally after ray.init has been called.

* Remove flaky reference counting tests. It's not clear that these tests make sense.

* Make numbuf pip install verbose.

* Export cached reusable variables before cached remote functions.

* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.

* Acquire locks using 'with' statements.

* Fix possible test failure.

* Try to start Redis multiple times with different random ports if the original attempt failed.

* Fix test in which we redefine a remote function.
2016-11-06 22:24:39 -08:00
Philipp Moritz
1147c4d34b Keep objects in cache between tasks (#29)
* fix caching behavior

* fixes
2016-11-06 17:31:14 -08:00
Robert Nishihara
072f442c1f Update worker.py and services.py to use plasma and the local scheduler. (#19)
* Update worker code and services code to use plasma and the local scheduler.

* Cleanups.

* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.

* Fix bug in install-dependencies.sh.

* Lengthen timeout in failure_test.py.

* Cleanups.

* Cleanup services.start_ray_local.

* Clean up random name generation.

* Cleanups.
2016-11-02 00:39:35 -07:00
Robert Nishihara
09a3ff7173 Pip install numbuf. (#8) 2016-10-28 14:30:20 -07:00
Robert Nishihara
0a44145906 Fix the resetting of reusable variables on the driver and cache functions to run on all workers. (#446)
* Properly reset reusable variables on the driver when remote functions are run locally on the driver.

* Cache functions to run on all workers that occur before ray.init is called.
2016-10-12 22:17:22 -07:00
Robert Nishihara
9a6991116f Small fix in test. (#441) 2016-09-25 23:08:27 -07:00
Robert Nishihara
de6ec47f9e Add a recursion depth for serialization to prevent infinite loops. (#440) 2016-09-19 17:17:42 -07:00
Robert Nishihara
91f16a3df0 Migrate repositories to ray-project. (#438)
* Migrate repositories to ray-project.

* Update numbuf to the migrated version.
2016-09-17 00:52:05 -07:00
Robert Nishihara
1aa89a4ae6 Update numbuf to properly handle Python floats. (#435) 2016-09-15 15:44:11 -07:00
Wapaul1
d5815673a5 Changed ray.select() to ray.wait() and its functionality (#426)
* Re-implemented select, changed name to wait

* Changed tests for select to tests for wait

* Updated the hyperopt example to match wait

* Small fixes and improve example readme.

* Make tests pass.
2016-09-14 17:14:11 -07:00
Robert Nishihara
3b47a15ebd Fix naming in tests. (#424) 2016-09-10 21:12:09 -07:00
Robert Nishihara
ba56b08474 Reintroduce passing arguments by value to remote functions. (#425)
* Reintroduce passing arguments by value to remote functions.

* Check size of arguments passed by value.

* Fix computation graph visualization.
2016-09-10 21:11:18 -07:00
Robert Nishihara
0191d42751 Check in runtest.py that the correct version of cloudpickle is installed. (#421) 2016-09-09 16:46:18 -07:00
Robert Nishihara
11a8914684 Allow users to serialize custom classes. (#393)
* Allow serialization of custom classes.

* Add documentation and test cases, also fix pickle case.

* Don't allow old-style classes.
2016-09-06 13:28:24 -07:00
Robert Nishihara
d5cb3ac090 Propagate error messages from functions that run on all workers. (#410) 2016-09-06 10:06:43 -07:00
Robert Nishihara
327d7ff689 Fix bug to enable calling ray.get multiple times on same ObjectID. (#409) 2016-09-04 13:32:55 -07:00
Philipp Moritz
68cec55a98 Refcount without modifying objects (#407)
* refcount without modifying objects

* add documentation

* Update tests and documentation.

* Remove extraneous code.

* Update numbuf version.
2016-09-04 12:07:52 -07:00
Robert Nishihara
81f40774a7 Remove ObjectID aliasing from the API. (#406)
* Remove ObjectID aliasing from the API.

* Update documentation to remove aliasing.
2016-09-03 19:34:45 -07:00
Philipp Moritz
3548797202 [API] Implement get for multiple objects (#398)
* [API] Implement get for multiple objects

* Small fixes.
2016-09-02 18:02:44 -07:00
Robert Nishihara
fb7ccef493 Allow remote decorator to be used with no parentheses. 2016-08-30 16:38:26 -07:00
Robert Nishihara
ce4e5ec544 Fix failure_test.py. 2016-08-29 22:52:13 -07:00
Robert Nishihara
d7f313a026 Remove type information from remote decorator. 2016-08-29 22:05:59 -07:00
Philipp Moritz
93e6c9947b update numbuf (#392)
* update numbuf

* Augment serialization tests.
2016-08-25 20:05:48 -07:00
Wapaul1
420bcc0477 Remote function returning non-serializable type no longer shuts worker down (#384)
* Moved put_objects in main_loop to inside of try block

* Added test for failed serialization

* Fixed naming

* Minor
2016-08-25 15:26:22 -07:00
Robert Nishihara
314bc9e980 Test blocking behavior of select. (#379) 2016-08-16 14:54:54 -07:00
Robert Nishihara
e06311d415 Automatically add relevant directories to Python paths of workers (#380)
* Make ray.init set python paths of workers.

* Decouple starting cluster from copying user source code

* also add current directory to path

* Add comments about deallocation.

* Add test for new code path.
2016-08-16 14:53:55 -07:00
Wapaul1
7246013008 Implement select to enable waiting for a specific number of remote objects to be ready. (#369) 2016-08-15 16:51:59 -07:00
Robert Nishihara
87bb7a8f67 [WIP] Large changes to make the tests pass. (#376)
* Revert "Make tests more informative (#372)"

This reverts commit fd353250c8.

* fix bugs, in particular deactivate worker service on driver and remove condition variables

* changes to minimize the changes in this PR

* switch from faulty mutex synchronization to using atomics

* Increase the default size of the message queues, to accommodate exporting large numbers of remote functions. This is a temporary fix, but not a long term solution.

* Reorganize the scheduler export code to queue up exports. This does not solve the underlying problem yet, but sets up a solution.

* Start a separate thread on driver to print error messages by constantly querying the scheduler. This is a temporary solution because the solution based on starting a worker service for the driver which the scheduler can push error messages to is buggy.

* Fix segfault in taskcapsule destructor.

* Move tests for catching errors into a separate test file.

* Revert "roll back grpc (#368)"

This reverts commit c01ef95d04.
2016-08-15 11:02:54 -07:00
Robert Nishihara
fd353250c8 Make tests more informative (#372)
* Make tests more informative

* Change grpc status checks to warning instead of fatal
2016-08-11 12:40:55 -07:00
Wapaul1
362ffa1f3c Changing hard coded ports for objstore and workers to choose unused ports (#365)
* let grpc choose unused worker and object store ports

* Add objstore addresses to scheduler info to bring back test
2016-08-10 19:08:38 -07:00
Johann Schleier-Smith
d8dd9de81b remove verbose flag in tar (#363) 2016-08-09 13:43:25 -07:00
Johann Schleier-Smith
4d5f85e77c Fix Travis failure detection on Mac + testing enhancements (#358)
* properly handle failure in continuous integration

* add stack overflow reference
2016-08-08 18:02:52 -07:00
Robert Nishihara
13df8302e6 enable running example apps in cluster mode (#357) 2016-08-08 16:01:13 -07:00
Robert Nishihara
feee1de56f run runtest.py in mac os x (#356) 2016-08-07 13:53:56 -07:00
Robert Nishihara
a1e4268d37 Catch errors in importing reusable variables and remote functions (#354)
* catch errors in importing reusable variables and remote functions

* updates
2016-08-07 13:53:33 -07:00
Philipp Moritz
8bf877ac1e Serialize and Deserialize unicode (#349) 2016-08-04 21:06:31 -07:00
Robert Nishihara
ac363bf451 Let worker get worker address and object store address from scheduler (#350) 2016-08-04 17:47:08 -07:00
Johann Schleier-Smith
583df08957 Docker builds on Travis (#343)
* attempt to build on travis using docker

* run tests in foreground

* add examples to travis tests

* test from current checkout

* attempt to fix docker version issues

* try build with xenial

* attempt docker upgrade

* avoid hang on configuration files

* matrix osx and linux w/ docker

* restore non-test docker builds

* fix typo

* tuning and cleanup

* add missing file

* comment cleanup
2016-08-02 17:03:28 -07:00
Philipp Moritz
0ac4e29b1f clean up tests (#340) 2016-08-02 16:11:53 -07:00
Robert Nishihara
c27e6c076c Make sure no Python modules mutually import each other. (#334) 2016-08-01 17:55:38 -07:00
Wapaul1
97b923a750 Changed how ray treats deserialization of custom classes (#333) 2016-08-01 15:38:05 -07:00
Robert Nishihara
98a508d6ca Terminology change Object Reference -> Object ID (#330) 2016-07-31 19:58:03 -07:00