Commit graph

749 commits

Author SHA1 Message Date
Robert Nishihara
ab8c3432f7 Add driver ID to task spec and add driver ID to Python error handling. (#225)
* Add driver ID to task spec and add driver ID to Python error handling.

* Make constants global variables.

* Add test for error isolation.
2017-01-25 22:53:48 -08:00
Stephanie Wang
3c6686db08 Photon optimizations (#219)
* Optimizations:
- Track mapping of missing object to dependent tasks to avoid iterating over task queue
- Perform all fetch requests for missing objects using the same timer

* Fix bug and add regression test

* Record task dependencies and active fetch requests in the same hash table

* fix typo

* Fix memory leak and add test cases for scheduling when dependencies are evicted

* Fix python3 test case

* Minor details.
2017-01-23 19:44:15 -08:00
Richard Liaw
4575cd88b2 Improve error messages when nodes can't communicate with each other. (#223)
* Good error messages when nodes can't communicate with each other

* Print more information when starting the head node.

* Change retries back to 5.
2017-01-22 14:53:15 -08:00
Robert Nishihara
7151ed5cdf Fix bug in tensorflow tests. (#218)
* Fix bug in tensorflow tests.

* Address comment.
2017-01-19 20:29:05 -08:00
Robert Nishihara
9bb8162621 Improvements to documentation and error messages. (#221) 2017-01-19 20:27:46 -08:00
Richard Liaw
b3b294e3ad updated cluster documentation (#216) 2017-01-19 13:59:54 -08:00
Robert Nishihara
b98a63fd3a Change get to take a timeout and multiple object IDs. (#212)
* Change plasma_get to take a timeout and an array of object IDs.

* Address comments.

* Bug fix related to computing object hashes.

* Add test.

* Fix file descriptor leak.

* Fix valgrind.

* Formatting.

* Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get.

* small fixes
2017-01-19 12:21:12 -08:00
Johann Schleier-Smith
4f6100b67f fix docker build bug (#207) 2017-01-18 23:23:34 -08:00
Robert Nishihara
677a019cbd Remove unnecessary bookkeepping in utlist in plasma client. (#215) 2017-01-18 23:03:08 -08:00
Robert Nishihara
58570b4981 Give better error message if ray.put or ray.get are called before ray.init. (#214) 2017-01-18 22:37:37 -08:00
Stephanie Wang
f1987cdc16 Split local scheduler task queue (#211)
* Split local scheduler task queue into waiting and dispatch queue

* Fix memory leak

* Add a new task scheduling status for when a task has been queued locally

* Fix global scheduler test case and add task status doc

* Documentation

* Address Philipp's comments

* Move tasks back to the waiting queue if their dependencies become unavailable

* Update existing task table entries instead of overwriting
2017-01-18 20:27:40 -08:00
Wapaul1
6fe69bec11 Selects from all variables now independent of graph, and uses standar… (#199)
* Smarter variable retrieval and doc update

* doc update and small fixes

* addressing robert's comments
2017-01-18 17:36:58 -08:00
Robert Nishihara
303d0fed3e Prevent plasma store and manager from dying when a client dies. (#203)
* Prevent plasma store and manager from dying when a worker dies.

* Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.
2017-01-17 20:34:31 -08:00
Philipp Moritz
7f329db4b2 wait until kill operation was successful (#210) 2017-01-17 20:15:48 -08:00
Philipp Moritz
a708e36225 Switch build system to use CMake completely. (#200)
* switch to CMake completely

...

* cleanup

* Run C tests, update installation instructions.
2017-01-17 16:56:40 -08:00
Robert Nishihara
ba8933e10f Update tutorial. (#196)
* Update tutorial.

* Small updates to documentation and code.
2017-01-10 23:52:38 -08:00
Robert Nishihara
87d8d05792 Rename reusable variables -> environment variables. (#195) 2017-01-10 20:14:33 -08:00
Wapaul1
aaf3be3c53 Fixed lbfgs for ray-cluster (#180)
* Updated lbfgs example to include TensorflowVariables

* Whitespace.
2017-01-10 18:40:06 -08:00
Robert Nishihara
be4a37bf37 Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193)
* Remove start_ray_local from ray.init and change default number of workers to 10.

* Remove alexnet example.

* Move array methods to experimental.

* Remove TRPO example.

* Remove old files.

* Compile plasma when we build numbuf.

* Address comments.
2017-01-10 17:35:27 -08:00
Wapaul1
b9d6135aa1 Added option for user to not pass in the session and error messages if so (#192)
* Added option for user to not pass in the session

* Small changes.
2017-01-09 21:03:22 -08:00
Philipp Moritz
ab3448a9b4 Plasma Optimizations (#190)
* bypass python when storing objects into the object store

* clang-format

* Bug fixes.

* fix include paths

* Fixes.

* fix bug

* clang-format

* fix

* fix release after disconnect
2017-01-09 20:15:54 -08:00
Robert Nishihara
0320902787 Fix Python reference counting bug. (#191) 2017-01-09 13:08:02 -08:00
Robert Nishihara
973716d310 Use cloudpickle 0.2.2. (#189) 2017-01-08 17:30:06 -08:00
Alexey Tumanov
674ec3a3cb generate pytask from string and string from pytask (#188)
* pytask creation from bytestring: saving work

* pytask now works

* documentation and tests

* linting

* Lint and fix test case
2017-01-08 02:16:40 -08:00
Wapaul1
c45342e39d Updated code to mesh with get_weights returning a dict and new tf code (#187)
* Updated code to mesh with get_weights returning a dict and new tf code

* Added tf.global_variables_initalizer to hyperopt example as well

* Small fix.

* Small name change.
2017-01-07 14:25:45 -08:00
Wapaul1
0ac2abee51 Added helper class for getting tf variables from loss function (#184)
* Added helper class for getting tf variables from loss function

* Updated usage and documentation

* Removed try-catches

* Added futures

* Added documentation

* fixes and tests

* more tests

* install tensorflow in travis
2017-01-07 01:54:11 -08:00
Stephanie Wang
c13d73b4c9 Suppress duplicate transfer requests (#185) 2017-01-06 22:14:51 -08:00
Philipp Moritz
33d7004914 New web UI. (#176)
* remove node.js webui

* temp commit

* flesh out web ui

* add documentation

* add ray timeline

* Small changes to documentation and formatting.
2017-01-06 00:13:22 -08:00
Wapaul1
417c04bac8 Removed iteritems and xrange for python3 in rl_pong (#182)
* Removed iteritems and xrange for python3

* Remove unused variable.
2017-01-05 20:37:00 -08:00
Stephanie Wang
cac473b557 Make numpy arrays immutable (#183)
* Make numpy arrays immutable in numbuf

* Move break statement outside of brackets

* Simplify test case

* Simplify test case
2017-01-05 19:47:52 -08:00
Robert Nishihara
651aa6007a Log profiling information from worker. (#178)
* Log timing events on workers.

* Have workers log to the event log through the local scheduler.

* Fixes and address comments.

* bug fix

* styling
2017-01-05 16:47:16 -08:00
Robert Nishihara
509685d240 Let the worker know about remote functions that failed to unpickle. (#175)
* Let the worker know about remote functions that failed to unpickle.

* Cleanup.
2017-01-03 18:41:03 -08:00
Johann Schleier-Smith
b1e76e582e Check /dev/shm on Linux (#174)
* check available shared memory when starting object store

* exit with error if not enough shared memory available for object store

* Some comments and formatting.
2017-01-03 12:33:29 -08:00
Robert Nishihara
431bba3c8a Catch numbuf glibcxx error on python 2. (#170) 2016-12-31 18:02:30 -08:00
Robert Nishihara
4d53fe504e Fix out-of-box installation instructions. (#173) 2016-12-31 17:53:53 -08:00
Johann Schleier-Smith
8bb87a4f6b updated Docker files (#171)
* updated Docker files

* single Docker RUN for apt-get installs and cleanup

* stylistic cleanup
2016-12-31 17:21:33 -08:00
Johann Schleier-Smith
1616426ccf add wget dependency for osx install (#172) 2016-12-31 16:06:00 -08:00
Robert Nishihara
d1594860de Remove javascript dependencies. (#169) 2016-12-30 23:16:17 -08:00
Robert Nishihara
603a7e3dd3 Add documentation for troubleshooting installation. (#167) 2016-12-30 23:15:25 -08:00
Wapaul1
e00b27b14e Removed webui code from setup.py and services.py (#168) 2016-12-30 21:45:58 -08:00
Robert Nishihara
84296c8905 Documentation for using Ray on a cluster. (#165) 2016-12-30 00:29:03 -08:00
Robert Nishihara
13ee0ef366 Only download arrow if not already present. (#166) 2016-12-30 00:25:46 -08:00
Stephanie Wang
6828d694ae Test object notifications from Plasma store (#141)
* Object notification test for Photon, and turn on valgrind for Photon C tests

* Test object notification handler in the plasma manager

* Fix hanging test case
2016-12-29 23:10:38 -08:00
Robert Nishihara
f9f667de47 Improve formatting of error messages. (#154)
* Improve formatting of error messages.

* Catch errors that occur when looking up function name from function ID.

* Push warning to user if worker spends to long waiting for proper import counter.

* Fixes.

* Add comment.
2016-12-29 00:11:13 -08:00
Robert Nishihara
acf1703afd Implement naive scheduling algorithm using local scheduler load. (#164)
* Implement naive scheduling algorithm using local scheduler load.

* Have the global scheduler estimate load on local schedulers better.

* Fixes.
2016-12-28 22:33:20 -08:00
Robert Nishihara
a1a08b9ad4 Cause pip installation of numbuf to fail if the build.sh or setup.sh fail. (#163) 2016-12-28 16:54:14 -08:00
Stephanie Wang
c403ab11ab Allow ray.init to take in address information about existing services. (#161)
* Refactor ray.init and ray.services to allow processes that are already running

* Fix indexing error

* Address Robert's comments
2016-12-28 14:17:29 -08:00
Robert Nishihara
baf835efcd Throw Python exception if plasma store cannot create new object. (#162)
* Propagate error messages through plasma create.

* Use custom exception types instead of exception messages.
2016-12-28 11:56:16 -08:00
Robert Nishihara
10e067e5e5 Delay releasing a maximum number of bytes in the plasma client. (#160)
* Send message from plasma client to get plasma store capacity.

* Release objects from plasma client if they are too large.

* Use doubly-linked list instead of ring buffer for plasma client release history.

* Address comments.

* Fix problem with slicing PlasmaBuffer objects.

* Fix crash in plasma manager during transfer.

* Formatting.

* Make plasma client cache larger and make caching test not throw exceptions on Travis.
2016-12-27 19:51:26 -08:00
Robert Nishihara
26941e02aa Attempt to free up to 20% of the plasma store capacity during eviction. (#159) 2016-12-27 12:12:33 -08:00