Commit graph

1171 commits

Author SHA1 Message Date
Robert Nishihara
5bb07cb01b Remove old UI code. (#688) 2017-06-21 05:54:21 +00:00
Robert Nishihara
5ebc2f3f2e Do resource bookkeeping for actor methods. (#682)
* Dispatch regular and actor tasks when resources become available.

* Make actor methods do resource bookkeeping and add test.

* Remove unnecessary field.

* Fix linting.

* Fix actor test.

* Maintain set of actors with pending tasks to speed up task dispatch.

* Exit early from task dispatch if there are no resources available.

* Fix linting.

* Fix error.

* Fix bug related to iterator invalidation.

* When an actor is removed, remove it from the set of actors with pending tasks.
2017-06-21 05:52:45 +00:00
alanamarzoev
ed9380d73d Automatically start web UI in ray.init(). (#687)
* Start up webui on ray.init

* Removed .ipynb checkpoint folders.

* Removed print statements in cleanup function.

* Fixed

* Removed extra file.

* Cleaned up ui.

* Don't start browser automatically in ray.init(), also copy the notebook every time so that changes don't persist.

* Update setup.py and installation instructions to install jupyter.

* Don't automatically install jupyter, don't start the UI if jupyter is not installed.

* Improve error message when failing to start UI.
2017-06-20 10:32:55 -07:00
Robert Nishihara
3052ce25a6 Divide up large fetch requests from local scheduler, also print warni… (#683)
* Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow.

* Fix linting.

* Fix typo.
2017-06-19 22:57:51 +00:00
Robert Nishihara
9e4a3e4972 Replace some UT data structures in local scheduler with C++ STL. (#680)
* Replace a local scheduler ut_array with a std::vector.

* Replace vector of sizes in local scheduler with std::pair.

* Remove utarray include.

* Replace utarray with std::vector for reading local scheduler input messages.

* Remove more UT data structures.

* Remove UT includes.

* Fix linting.

* Include stdlib.h to find size_t.

* Remove includes of stdbool.h.

* Replace std::pair with TaskQueueEntry.

* Fix redis tests.

* Reinstate tests.
2017-06-19 21:58:42 +00:00
Philipp Moritz
9bcaaaeaf5 Debugging for policy gradients (#681)
* configuration option for tensorflow debugger

* add model checkpointing

* fix linting

* make it possible to run without checkpointing

* fix

* loading from checkpoint and expose debugger through cli

* todo for filters

* Fix typo.
2017-06-18 17:58:41 -07:00
Robert Nishihara
f12db5f0e2 Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678)
* Divide large get requests into smaller chunks.

* Divide fetches into smaller chunks.

* Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests.

* Log warning if a handler in the local scheduler or plasma manager takes more than one second.
2017-06-18 04:42:15 +00:00
alanamarzoev
4d5ac9dad5 Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665)
* added log_table function and a test

* fixed log_files and added task_profiles

* fixed formatting

* fixed linting errors

* fixes

* removed file

* more fixes

* hopefully fixed

* Small changes.

* Fix linting.

* Fix bug in log monitor.

* Small changes.

* Fix bug in travis.

* Including data_size and hash in the ResultTableReply.

* Included data_size and hash info in object_table.

* Fixed bugs in ray_redis_module.cc.

* Removing commented out code.

* Fixes

* Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put.

* Changed it so that data_size is set correctly.

* Removed iostream import.

* Included a check to ensure that the Redis string to long long conversion was successful.

* Included separate data_size and hash null checks.

* Fixed bug.

* Made linting changes.

* Another linting error.

* Slight simplication.
2017-06-16 23:17:11 -07:00
Robert Nishihara
019ba07e9c Correct actor class name and module. (#675)
* Correct actor class name and module.

* Add test.

* Fix linting.
2017-06-17 05:44:42 +00:00
Robert Nishihara
96962cdee0 Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676)
* Log fatal error if plasma manager or local scheduler take too long to send heartbeat.

* Fix linting.

* Use int64_t for milliseconds since unix epoch.
2017-06-16 19:11:01 +00:00
Alexey Tumanov
8317025987 reducing the size of objects created for the global scheduler test (#674) 2017-06-15 10:02:46 -07:00
Philipp Moritz
8798f4e690 fix flaky mac os x plasma store component_failure_test (#673)
Fix flaky mac os x plasma store component_failure_test.
2017-06-15 00:31:50 -07:00
Philipp Moritz
c343df832e use multiple threads for memcpy (#669) 2017-06-14 19:14:24 -07:00
alanamarzoev
cc4990b543 Task profiles function and test (#647)
Expose some task profiling information through global state API.
2017-06-13 17:53:34 -07:00
alanamarzoev
43bae46e47 Included worker_id in task event logs. (#668) 2017-06-13 17:30:43 -07:00
Robert Nishihara
fb119bb50c Automatically add ip addresses to list of known hosts in cluster usage documentation. (#667) 2017-06-14 00:13:33 +00:00
Philipp Moritz
54925996ca Allow remote functions to specify max executions and kill worker once limit is reached. (#660)
* implement restarting workers after certain number of task executions

* Clean up python code.

* Don't start new worker when an actor disconnects.

* Move wait_for_pid_to_exit to test_utils.py.

* Add test.

* Fix linting errors.

* Fix linting.

* Fix typo.
2017-06-13 00:34:58 -07:00
Eric Liang
4374ad1453 Policy gradient example: Support multi-GPU training (#584)
* add tf metrics

* comments

* fix network scopes

* add doc

* initial work

* try with 3 virtual cpus

* clean up metrics

* use format string

* fix trace level

* back to pong

* always run summary on cpu

* plot intermediate and final sgd stats

* add back a global step

* update

* add timeline

* use staging area and reuse weights properly

* stage at cpu

* whoops, stage only the batch

* clean up a bit

* fix py flake

* wip

* create an optimizer graph per device

* print timeline on 5th batch instead

* print examples per second

* log placement for training ops

* force placement on cpu:0

* try separating weights onto different gpus

* try using nccl

* add cpu fallback

* remove space from date

* check has gpu device

* fix flag config

* checkpoint

* wip

* update

* add some timing

* trace loading

* try cpu

* revert that

* remove expensive test

* lint

* cleanups

* clean up timers

* clean it up a bit

* fix code for non-scalar action spaces

* address some nits

* fix quotes

* efficient shuffling between sgd epochs
2017-06-13 06:03:25 +00:00
Robert Nishihara
1916475e14 Increase socket listen backlog from 5 to 128. (#661) 2017-06-11 06:34:16 +00:00
Richard Liaw
8d350f628a Fixing Redis Key Consistencies for Actor, FunctionTable, FunctionsToRun, and RemoteFunction (#659)
* consistencies for Actor, FunctionTable, and FunctionsToRun

* NOT WORKING: changing remote fn keys
2017-06-10 23:45:22 +00:00
Eric Liang
d4d2c03ac5 Remove timeout for Redis commands. (#649)
* update

* Remove interaction between callback data identifier and event loop.

* Remove tests that no longer apply.
2017-06-09 15:55:36 -07:00
alanamarzoev
ee1d4e5ea2 Redirect worker stdout/stderr to log files. (#646)
* local scheduler

* redirect output files to be associated with workers rather than the local scheduler

* fixed formatting

* fixes

* Moved output redirection logic to worker.py.

* Changed write mode.

* Fixed formatting.

* Added comment.

* Reuse log file creation in services.py.

* Fix linting.

* Fix problem in which multiple processes attempt to create /tmp/raylogs at the same time.
2017-06-08 18:30:48 -07:00
Crystal
fff50d824c Doc using ray with gpu (#644)
* Added to troubleshooting documentation about whether redefining remote functions runs the new code version

* Minor correction to troubleshooting documentation

* Writing new documentation page for using Ray with GPUs

* Wrote new documentation page on using ray with gpus

* Add some more details.
2017-06-08 00:12:44 -07:00
alanamarzoev
f0339f3386 Expose log files through global state API. (#641)
* added log_table function and a test

* fixed log_files and added task_profiles

* fixed formatting

* fixed linting errors

* fixes

* removed file

* more fixes

* hopefully fixed

* Small changes.

* Fix linting.

* Fix bug in log monitor.

* Small changes.

* Fix bug in travis.
2017-06-08 00:08:10 -07:00
Robert Nishihara
fde843a636 Update installation documentation to recommend installing Ray with pip. (#637) 2017-06-07 05:51:06 +00:00
Crystal
60161f276b Added to troubleshooting documentation about whether redefining remot… (#640)
* Added to troubleshooting documentation about whether redefining remote functions runs the new code version

* Minor correction to troubleshooting documentation

* Small rewordings.
2017-06-06 22:49:53 -07:00
Philipp Moritz
690fe10bb6 Save policies for Evolution Strategies (#638)
Save policies for evolution strategies.
2017-06-04 16:21:19 -07:00
Crystal
4c94d6c3b9 Rewrote and reordered the examples in the Actor documentation for cla… (#635)
* Rewrote and reordered the examples in the Actor documentation for clarity. Also added an introduction to Gym

* Minor tweaks to actor documentation

* Small changes to wording.
2017-06-02 23:42:41 -07:00
Philipp Moritz
6adf39959c put back large python object tests (commented out) (#636) 2017-06-02 20:36:10 -07:00
Robert Nishihara
301e0b0db8 Bump version to 0.1.1 in preparation for uploading wheels to PyPI. (#630) 2017-06-03 02:17:39 +00:00
Philipp Moritz
0254efa5e8 Use parallel memcopy from arrow (#633)
* use parallel memcopy from arrow

* fix linting

* remove memory.h
2017-06-02 18:18:41 -07:00
Robert Nishihara
2694337c0f Fix large memory tests. (#632)
* Log the driver ID in hex instead of binary.

* Fix large memory test and add more tests to it.

* Remove tests that are too stressful.
2017-06-03 01:12:56 +00:00
Robert Nishihara
23b0c80967 Rename linux wheels so they can be uploaded to PyPI. (#629) 2017-06-02 20:20:34 +00:00
Robert Nishihara
1a682e2807 Enable starting and stopping ray with "ray start" and "ray stop". (#628)
* Install start_ray and stop_ray scripts in setup.py.

* Update documentation.

* Fix docker tests.

* Implement stop_ray script in python.

* Fix linting.
2017-06-02 20:17:48 +00:00
Robert Nishihara
a4d8e13094 Suppress excess warning messages related to intentional actor deaths. (#627)
* Don't submit the actor destructor tasks when the job is exiting.

* Don't propagate error messages to the driver when an actor exits intentionally.
2017-06-01 20:10:40 +00:00
Robert Nishihara
d0bfc0a849 Clean up actor workers when actor handle goes out of scope. (#617) 2017-06-01 07:02:43 +00:00
Robert Nishihara
dd7f866a92 Fix compilation error on CentOS. (#622)
* Fix compilation error on CentOS.

* add TODO
2017-06-01 06:51:00 +00:00
Robert Nishihara
5f193afb87 Tell local scheduler to ignore SIGCHLD so that workers don't become zombies. (#620) 2017-06-01 06:37:28 +00:00
Robert Nishihara
4d51ed37b2 Fix bug in which plasma client file descriptors were not closed. (#618)
* Fix bug in which plasma client file descriptors were not closed.

* Add logging statement when disconnecting client from plasma store.

* Fix after rebasing.

* Add more checks to plasma disconnect client.
2017-06-01 05:37:29 +00:00
Robert Nishihara
bcaab78908 Add script for building MacOS wheels. (#601)
* Add script for building MacOS wheels.

* Small cleanups to script.

* Fix setting of PATH before building wheel.

* Create symbolic link to correct Python executable so Ray installation finds the right Python.

* Address comments.

* Rename readme.
2017-06-01 00:30:46 +00:00
Philipp Moritz
b94b4a35e0 Make the Plasma store ready for Arrow integration (#579)
* port plasma to arrow

* fixes

* refactor plasma client

* more modernization

* fix plasma manager tests

* everything compiles

* fix plasma client tests

* update plasma serialization tests

* fix plasma manager tests

* fix bug

* updates

* fix bug

* fix tests

* fix rebase

* address comments

* fix travis valgrind build

* fix linting

* fix include order again

* fix linting

* address comments
2017-05-31 16:24:23 -07:00
Richard Shin
609b5c1a4c Add script to build manylinux1 .whl files (#600)
* Add manylinux setup

* Switch to cp27mu

* python/MANIFEST.in

* Fix MANIFEST.in

* Add build-wheel-manylinux1.sh

* Update readme

* Install correct version of numpy

* Fix typo in README-manylinux1.md

* Don't install cmake

* Remove commented line from setup.py

* Delete unused manylinux1.sh

* Run setup.py bdist_wheel twice

* Don't use package_data and MANIFEST.in.

* Small aesthetic change.

* Trigger build_ext in setup.py.

* Remove nonexistent file from MANIFEST.in.

* Manually copy files in MANIFEST.in to where Python expects them in order to prevent setup.py from having to be run twice.

* Only run setup.py once when building wheels.

* Aesthetic change to readme.

* Copy generated flatbuffer Python files in build_ext.

* Fix permission denied error by making sure to preserve executableness when copying files.

* Remove unnecessary argument to setup.py.

* Remove MANIFEST.in and move files to include into list in setup.py.

* Fix numpy version when building wheels and replace rm with git clean.
2017-05-27 21:35:48 -07:00
Robert Nishihara
97af3b34d8 Use string instead of list in tutorial example to make it clearer. (#586) 2017-05-26 15:32:51 -07:00
Philipp Moritz
647e1d9fc3 Fix runtest.py on the ubuntu system python 3 (#599)
* fix runtest.py on the ubuntu system python 3

* less strict version of the test
2017-05-26 15:22:36 -07:00
Richard Shin
16050eca8d Don't link Python extensions to libpython*.so (#598) 2017-05-25 19:01:12 -07:00
Chelsea Finn
f97d0393cc Fix to json decoding bug (#597)
* fix json decoding bug

* Fix linting error.
2017-05-25 18:48:39 -07:00
Michael Whittaker
1985838a30 Fixed small typo in actors.rst. (#595) 2017-05-25 11:30:45 -07:00
Philipp Moritz
3885d1b286 make builds with CMake incremental (#592) 2017-05-24 21:52:33 -07:00
Robert Nishihara
997aa35721 Remove cloudpickle customization and just use plain cloudpickle. (#588)
* Remove augmentations of cloudpickle.

* Entirely remove cloudpickle modifications. Just use plain cloudpickle.
2017-05-24 20:22:28 -07:00
Philipp Moritz
679910496e fix policy gradients for mujoco domains (#589) 2017-05-24 18:39:37 -07:00