Commit graph

761 commits

Author SHA1 Message Date
Robert Nishihara
1fec94ef00 Display drivers in web UI. (#252)
* Display drivers in web UI.

* Display more rows in grid and factor out function in webui backend.
2017-02-07 14:21:25 -08:00
Philipp Moritz
fefc7d9b49 fix segfault in photon.Task (#253) 2017-02-07 11:17:11 -08:00
Robert Nishihara
2d1c980ad7 Refactor local scheduler to remove worker indices. (#245)
* Refactor local scheduler to remove worker indices.

* Change scheduling state enum to int in all function signatures.

* Bug fix, don't use pointers into a resizable array.

* Remove total_num_workers.

* Fix tests.
2017-02-05 14:52:28 -08:00
Philipp Moritz
ca254b8689 Fix stack overflow if many objects are fetched. (#237)
* fix stack overflow if many objects are fetched

* fix other stack allocations

* add tests and fix linting

* address stephanie's comments

* fix linting

* fix tests
2017-02-04 16:49:36 -08:00
Johann Schleier-Smith
e5a9fc0032 Cluster setup instructions (#233)
* start updating cluster documentation with parallel ssh

* add using ray on a large cluster

* revert changes to using ray on a cluster

* update cluster documentation

* update title

* Some formatting changes, and added some notes.

* clarification

* Add warning about public versus private IP addresses.

* Typos and wording.

* Clarifications.

* Clarifications.
2017-02-02 16:10:26 -08:00
Robert Nishihara
7a7e14ef85 Visualize recent tasks in timeline. (#240) 2017-02-02 15:53:56 -08:00
Stephanie Wang
241b539ff8 Reconstruction for evicted objects (#181)
* First pass at reconstruction in the worker

Modify reconstruction stress testing to start Plasma service before rest of Ray cluster

TODO about reconstructing ray.puts

Fix ray.put error for double creates

Distinguish between empty entry and no entry in object table

Fix test case

Fix Python test

Fix tests

* Only call reconstruct on objects we have not yet received

* Address review comments

* Fix reconstruction for Python3

* remove unused code

* Address Robert's comments, stress tests are crashing

* Test and update the task's scheduling state to suppress duplicate
reconstruction requests.

* Split result table into two lookups, one for task ID and the other as a
test-and-set for the task state

* Fix object table tests

* Fix redis module result_table_lookup test case

* Multinode reconstruction tests

* Fix python3 test case

* rename

* Use new start_redis

* Remove unused code

* lint

* indent

* Address Robert's comments

* Use start_redis from ray.services in state table tests

* Remove unnecessary memset
2017-02-01 19:18:46 -08:00
Robert Nishihara
f69d4aaaa7 Change fetch requests in plasma manager to use a single timer. (#234)
* Change fetch requests in plasma manager to use a single timer.

* Fix manager tests, other cleanups.
2017-02-01 12:21:52 -08:00
Johann Schleier-Smith
6ad2b5d87a Add Redis port option to startup script (#232)
* specify redis address when starting head

* cleanup

* update starting cluster documentation

* Whitespace.

* Address Philipp's comments.

* Change redis_host -> redis_ip_address.
2017-01-31 00:28:00 -08:00
Wapaul1
db7297865f Added functionality for retrieving variables from control dependencies (#220)
* Added test for retriving variables from an optimizer

* Added comments to test

* Addressed comments

* Fixed travis bug

* Added fix to circular controls

* Added set for explored operations and duplicate prefix stripping

* Removed embeded ipython

* Removed prefix, use seperate graph for each network

* Removed redundant imports

* Addressed comments and added separate graph to initializer

* fix typos

* get rid of prefix in documentation
2017-01-30 19:17:42 -08:00
Robert Nishihara
6703f7be6f Provide functionality for local scheduler to start new workers. (#230)
* Provide functionality for local scheduler to start new workers.

* Pass full command for starting new worker in to local scheduler.

* Separate out configuration state of local scheduler.
2017-01-27 01:28:48 -08:00
Stephanie Wang
a5c8f28f33 Plasma subscribe (#227)
* Use object_info as notification, not just the object_id

* Add a regression test for plasma managers connecting to store after some objects have been created

* Send notifications for existing objects to new plasma subscribers

* Continuously try the request to the plasma manager instead of setting a timeout in the test case

* Use ray.services to start Redis in plasma test cases

* fix test case
2017-01-25 22:57:15 -08:00
Robert Nishihara
ab8c3432f7 Add driver ID to task spec and add driver ID to Python error handling. (#225)
* Add driver ID to task spec and add driver ID to Python error handling.

* Make constants global variables.

* Add test for error isolation.
2017-01-25 22:53:48 -08:00
Stephanie Wang
3c6686db08 Photon optimizations (#219)
* Optimizations:
- Track mapping of missing object to dependent tasks to avoid iterating over task queue
- Perform all fetch requests for missing objects using the same timer

* Fix bug and add regression test

* Record task dependencies and active fetch requests in the same hash table

* fix typo

* Fix memory leak and add test cases for scheduling when dependencies are evicted

* Fix python3 test case

* Minor details.
2017-01-23 19:44:15 -08:00
Richard Liaw
4575cd88b2 Improve error messages when nodes can't communicate with each other. (#223)
* Good error messages when nodes can't communicate with each other

* Print more information when starting the head node.

* Change retries back to 5.
2017-01-22 14:53:15 -08:00
Robert Nishihara
7151ed5cdf Fix bug in tensorflow tests. (#218)
* Fix bug in tensorflow tests.

* Address comment.
2017-01-19 20:29:05 -08:00
Robert Nishihara
9bb8162621 Improvements to documentation and error messages. (#221) 2017-01-19 20:27:46 -08:00
Richard Liaw
b3b294e3ad updated cluster documentation (#216) 2017-01-19 13:59:54 -08:00
Robert Nishihara
b98a63fd3a Change get to take a timeout and multiple object IDs. (#212)
* Change plasma_get to take a timeout and an array of object IDs.

* Address comments.

* Bug fix related to computing object hashes.

* Add test.

* Fix file descriptor leak.

* Fix valgrind.

* Formatting.

* Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get.

* small fixes
2017-01-19 12:21:12 -08:00
Johann Schleier-Smith
4f6100b67f fix docker build bug (#207) 2017-01-18 23:23:34 -08:00
Robert Nishihara
677a019cbd Remove unnecessary bookkeepping in utlist in plasma client. (#215) 2017-01-18 23:03:08 -08:00
Robert Nishihara
58570b4981 Give better error message if ray.put or ray.get are called before ray.init. (#214) 2017-01-18 22:37:37 -08:00
Stephanie Wang
f1987cdc16 Split local scheduler task queue (#211)
* Split local scheduler task queue into waiting and dispatch queue

* Fix memory leak

* Add a new task scheduling status for when a task has been queued locally

* Fix global scheduler test case and add task status doc

* Documentation

* Address Philipp's comments

* Move tasks back to the waiting queue if their dependencies become unavailable

* Update existing task table entries instead of overwriting
2017-01-18 20:27:40 -08:00
Wapaul1
6fe69bec11 Selects from all variables now independent of graph, and uses standar… (#199)
* Smarter variable retrieval and doc update

* doc update and small fixes

* addressing robert's comments
2017-01-18 17:36:58 -08:00
Robert Nishihara
303d0fed3e Prevent plasma store and manager from dying when a client dies. (#203)
* Prevent plasma store and manager from dying when a worker dies.

* Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.
2017-01-17 20:34:31 -08:00
Philipp Moritz
7f329db4b2 wait until kill operation was successful (#210) 2017-01-17 20:15:48 -08:00
Philipp Moritz
a708e36225 Switch build system to use CMake completely. (#200)
* switch to CMake completely

...

* cleanup

* Run C tests, update installation instructions.
2017-01-17 16:56:40 -08:00
Robert Nishihara
ba8933e10f Update tutorial. (#196)
* Update tutorial.

* Small updates to documentation and code.
2017-01-10 23:52:38 -08:00
Robert Nishihara
87d8d05792 Rename reusable variables -> environment variables. (#195) 2017-01-10 20:14:33 -08:00
Wapaul1
aaf3be3c53 Fixed lbfgs for ray-cluster (#180)
* Updated lbfgs example to include TensorflowVariables

* Whitespace.
2017-01-10 18:40:06 -08:00
Robert Nishihara
be4a37bf37 Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193)
* Remove start_ray_local from ray.init and change default number of workers to 10.

* Remove alexnet example.

* Move array methods to experimental.

* Remove TRPO example.

* Remove old files.

* Compile plasma when we build numbuf.

* Address comments.
2017-01-10 17:35:27 -08:00
Wapaul1
b9d6135aa1 Added option for user to not pass in the session and error messages if so (#192)
* Added option for user to not pass in the session

* Small changes.
2017-01-09 21:03:22 -08:00
Philipp Moritz
ab3448a9b4 Plasma Optimizations (#190)
* bypass python when storing objects into the object store

* clang-format

* Bug fixes.

* fix include paths

* Fixes.

* fix bug

* clang-format

* fix

* fix release after disconnect
2017-01-09 20:15:54 -08:00
Robert Nishihara
0320902787 Fix Python reference counting bug. (#191) 2017-01-09 13:08:02 -08:00
Robert Nishihara
973716d310 Use cloudpickle 0.2.2. (#189) 2017-01-08 17:30:06 -08:00
Alexey Tumanov
674ec3a3cb generate pytask from string and string from pytask (#188)
* pytask creation from bytestring: saving work

* pytask now works

* documentation and tests

* linting

* Lint and fix test case
2017-01-08 02:16:40 -08:00
Wapaul1
c45342e39d Updated code to mesh with get_weights returning a dict and new tf code (#187)
* Updated code to mesh with get_weights returning a dict and new tf code

* Added tf.global_variables_initalizer to hyperopt example as well

* Small fix.

* Small name change.
2017-01-07 14:25:45 -08:00
Wapaul1
0ac2abee51 Added helper class for getting tf variables from loss function (#184)
* Added helper class for getting tf variables from loss function

* Updated usage and documentation

* Removed try-catches

* Added futures

* Added documentation

* fixes and tests

* more tests

* install tensorflow in travis
2017-01-07 01:54:11 -08:00
Stephanie Wang
c13d73b4c9 Suppress duplicate transfer requests (#185) 2017-01-06 22:14:51 -08:00
Philipp Moritz
33d7004914 New web UI. (#176)
* remove node.js webui

* temp commit

* flesh out web ui

* add documentation

* add ray timeline

* Small changes to documentation and formatting.
2017-01-06 00:13:22 -08:00
Wapaul1
417c04bac8 Removed iteritems and xrange for python3 in rl_pong (#182)
* Removed iteritems and xrange for python3

* Remove unused variable.
2017-01-05 20:37:00 -08:00
Stephanie Wang
cac473b557 Make numpy arrays immutable (#183)
* Make numpy arrays immutable in numbuf

* Move break statement outside of brackets

* Simplify test case

* Simplify test case
2017-01-05 19:47:52 -08:00
Robert Nishihara
651aa6007a Log profiling information from worker. (#178)
* Log timing events on workers.

* Have workers log to the event log through the local scheduler.

* Fixes and address comments.

* bug fix

* styling
2017-01-05 16:47:16 -08:00
Robert Nishihara
509685d240 Let the worker know about remote functions that failed to unpickle. (#175)
* Let the worker know about remote functions that failed to unpickle.

* Cleanup.
2017-01-03 18:41:03 -08:00
Johann Schleier-Smith
b1e76e582e Check /dev/shm on Linux (#174)
* check available shared memory when starting object store

* exit with error if not enough shared memory available for object store

* Some comments and formatting.
2017-01-03 12:33:29 -08:00
Robert Nishihara
431bba3c8a Catch numbuf glibcxx error on python 2. (#170) 2016-12-31 18:02:30 -08:00
Robert Nishihara
4d53fe504e Fix out-of-box installation instructions. (#173) 2016-12-31 17:53:53 -08:00
Johann Schleier-Smith
8bb87a4f6b updated Docker files (#171)
* updated Docker files

* single Docker RUN for apt-get installs and cleanup

* stylistic cleanup
2016-12-31 17:21:33 -08:00
Johann Schleier-Smith
1616426ccf add wget dependency for osx install (#172) 2016-12-31 16:06:00 -08:00
Robert Nishihara
d1594860de Remove javascript dependencies. (#169) 2016-12-30 23:16:17 -08:00