Commit graph

778 commits

Author SHA1 Message Date
Robert Nishihara
0320902787 Fix Python reference counting bug. (#191) 2017-01-09 13:08:02 -08:00
Robert Nishihara
973716d310 Use cloudpickle 0.2.2. (#189) 2017-01-08 17:30:06 -08:00
Alexey Tumanov
674ec3a3cb generate pytask from string and string from pytask (#188)
* pytask creation from bytestring: saving work

* pytask now works

* documentation and tests

* linting

* Lint and fix test case
2017-01-08 02:16:40 -08:00
Wapaul1
c45342e39d Updated code to mesh with get_weights returning a dict and new tf code (#187)
* Updated code to mesh with get_weights returning a dict and new tf code

* Added tf.global_variables_initalizer to hyperopt example as well

* Small fix.

* Small name change.
2017-01-07 14:25:45 -08:00
Wapaul1
0ac2abee51 Added helper class for getting tf variables from loss function (#184)
* Added helper class for getting tf variables from loss function

* Updated usage and documentation

* Removed try-catches

* Added futures

* Added documentation

* fixes and tests

* more tests

* install tensorflow in travis
2017-01-07 01:54:11 -08:00
Stephanie Wang
c13d73b4c9 Suppress duplicate transfer requests (#185) 2017-01-06 22:14:51 -08:00
Philipp Moritz
33d7004914 New web UI. (#176)
* remove node.js webui

* temp commit

* flesh out web ui

* add documentation

* add ray timeline

* Small changes to documentation and formatting.
2017-01-06 00:13:22 -08:00
Wapaul1
417c04bac8 Removed iteritems and xrange for python3 in rl_pong (#182)
* Removed iteritems and xrange for python3

* Remove unused variable.
2017-01-05 20:37:00 -08:00
Stephanie Wang
cac473b557 Make numpy arrays immutable (#183)
* Make numpy arrays immutable in numbuf

* Move break statement outside of brackets

* Simplify test case

* Simplify test case
2017-01-05 19:47:52 -08:00
Robert Nishihara
651aa6007a Log profiling information from worker. (#178)
* Log timing events on workers.

* Have workers log to the event log through the local scheduler.

* Fixes and address comments.

* bug fix

* styling
2017-01-05 16:47:16 -08:00
Robert Nishihara
509685d240 Let the worker know about remote functions that failed to unpickle. (#175)
* Let the worker know about remote functions that failed to unpickle.

* Cleanup.
2017-01-03 18:41:03 -08:00
Johann Schleier-Smith
b1e76e582e Check /dev/shm on Linux (#174)
* check available shared memory when starting object store

* exit with error if not enough shared memory available for object store

* Some comments and formatting.
2017-01-03 12:33:29 -08:00
Robert Nishihara
431bba3c8a Catch numbuf glibcxx error on python 2. (#170) 2016-12-31 18:02:30 -08:00
Robert Nishihara
4d53fe504e Fix out-of-box installation instructions. (#173) 2016-12-31 17:53:53 -08:00
Johann Schleier-Smith
8bb87a4f6b updated Docker files (#171)
* updated Docker files

* single Docker RUN for apt-get installs and cleanup

* stylistic cleanup
2016-12-31 17:21:33 -08:00
Johann Schleier-Smith
1616426ccf add wget dependency for osx install (#172) 2016-12-31 16:06:00 -08:00
Robert Nishihara
d1594860de Remove javascript dependencies. (#169) 2016-12-30 23:16:17 -08:00
Robert Nishihara
603a7e3dd3 Add documentation for troubleshooting installation. (#167) 2016-12-30 23:15:25 -08:00
Wapaul1
e00b27b14e Removed webui code from setup.py and services.py (#168) 2016-12-30 21:45:58 -08:00
Robert Nishihara
84296c8905 Documentation for using Ray on a cluster. (#165) 2016-12-30 00:29:03 -08:00
Robert Nishihara
13ee0ef366 Only download arrow if not already present. (#166) 2016-12-30 00:25:46 -08:00
Stephanie Wang
6828d694ae Test object notifications from Plasma store (#141)
* Object notification test for Photon, and turn on valgrind for Photon C tests

* Test object notification handler in the plasma manager

* Fix hanging test case
2016-12-29 23:10:38 -08:00
Robert Nishihara
f9f667de47 Improve formatting of error messages. (#154)
* Improve formatting of error messages.

* Catch errors that occur when looking up function name from function ID.

* Push warning to user if worker spends to long waiting for proper import counter.

* Fixes.

* Add comment.
2016-12-29 00:11:13 -08:00
Robert Nishihara
acf1703afd Implement naive scheduling algorithm using local scheduler load. (#164)
* Implement naive scheduling algorithm using local scheduler load.

* Have the global scheduler estimate load on local schedulers better.

* Fixes.
2016-12-28 22:33:20 -08:00
Robert Nishihara
a1a08b9ad4 Cause pip installation of numbuf to fail if the build.sh or setup.sh fail. (#163) 2016-12-28 16:54:14 -08:00
Stephanie Wang
c403ab11ab Allow ray.init to take in address information about existing services. (#161)
* Refactor ray.init and ray.services to allow processes that are already running

* Fix indexing error

* Address Robert's comments
2016-12-28 14:17:29 -08:00
Robert Nishihara
baf835efcd Throw Python exception if plasma store cannot create new object. (#162)
* Propagate error messages through plasma create.

* Use custom exception types instead of exception messages.
2016-12-28 11:56:16 -08:00
Robert Nishihara
10e067e5e5 Delay releasing a maximum number of bytes in the plasma client. (#160)
* Send message from plasma client to get plasma store capacity.

* Release objects from plasma client if they are too large.

* Use doubly-linked list instead of ring buffer for plasma client release history.

* Address comments.

* Fix problem with slicing PlasmaBuffer objects.

* Fix crash in plasma manager during transfer.

* Formatting.

* Make plasma client cache larger and make caching test not throw exceptions on Travis.
2016-12-27 19:51:26 -08:00
Robert Nishihara
26941e02aa Attempt to free up to 20% of the plasma store capacity during eviction. (#159) 2016-12-27 12:12:33 -08:00
Robert Nishihara
985c424172 Use redismodules for task table and result table. (#156)
* Switch to using redis modules for task table.

* Switch to using redis modules for the task table.

* Fix some tests.

* Fix naming and remove code duplication.

* Remove duplication in redis modules and add more cleanups.

* Address comments.
2016-12-25 23:57:05 -08:00
Philipp Moritz
d6695c867a fix wait test (#158) 2016-12-25 23:43:01 -08:00
Philipp Moritz
8309e3f355 Redis string formatting (#157)
* redis string formatting

* fixes

* add documentation

* fixes
2016-12-25 22:43:07 -08:00
Robert Nishihara
3d697c7ed2 Introduce local scheduler heartbeats which carry load information. (#155)
* Introduce local scheduler heartbeats which carry load information.
2016-12-24 20:02:25 -08:00
Robert Nishihara
9bb9f8cb54 Fix bug in ray.wait. (#153)
* Fix bug in wait implementation.

* Add test that exposes previous bug.
2016-12-23 16:22:41 -08:00
Robert Nishihara
241c955707 Determine node IP address programatically. (#151)
* Determine node ip address programatically.

* Factor out methods for getting node IP addresses.

* Address comments.
2016-12-23 15:31:40 -08:00
Robert Nishihara
8d90c9f432 Experimental utils for copying directories to other machines in the c… (#150)
* Experimental utils for copying directories to other machines in the cluster using Ray.

* Test copying directory functionality.

* Small fix.
2016-12-23 00:43:16 -08:00
Robert Nishihara
86b211f5c2 Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149)
* Suppress Redis warnings and remove some global scheduler logging.

* Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.
2016-12-22 22:05:58 -08:00
Robert Nishihara
92010ca5b5 Check that we can connect to Redis and that there aren't existing redis clients on the same node in start_ray.py (#148) 2016-12-22 21:54:19 -08:00
Alexey Tumanov
46a887039e Global scheduler - per-task transfer-aware policy (#145)
* global scheduler with object transfer cost awareness -- upstream rebase

* debugging global scheduler: multiple subscriptions

* global scheduler: utarray push bug fix; tasks change state to SCHEDULED

* change global scheduler test to be an integraton test

* unit and integration tests are passing for global scheduler

* improve global scheduler test: break up into several

* global scheduler checkpoint: fix photon object id bug in test

* test with timesync between object and task notifications; TODO: handle OoO object+task notifications in GS

* fallback to base policy if no object dependencies are cached (may happen due to OoO object+task notification arrivals

* clean up printfs; handle a missing LS in LS cache

* Minor changes to Python test and factor out some common code.

* refactoring handle task waiting

* addressing comments

* log_info -> log_debug

* Change object ID printing.

* PRId64 merge

* Python 3 fix.

* PRId64.

* Python 3 fix.

* resurrect differentiation between no args and missing object info; spacing

* Valgrind fix.

* Run all global scheduler tests in valgrind.

* clang format

* Comments and documentation changes.

* Minor cleanups.

* fix whitespace

* Fix.

* Documentation fix.
2016-12-22 03:11:46 -08:00
Robert Nishihara
6cd02d71f8 Fixes and cleanups for the multinode setting. (#143)
* Add function for driver to get address info from Redis.

* Use Redis address instead of Redis port.

* Configure Redis to run in unprotected mode.

* Add method for starting Ray processes on non-head node.

* Pass in correct node ip address to start_plasma_manager.

* Script for starting Ray processes.

* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.

* Have driver get info from Redis when start_ray_local=False.

* Fix.

* Script for killing ray processes.

* Catch some errors when the main_loop in a worker throws an exception.

* Allow redirecting stdout and stderr to /dev/null.

* Wrap start_ray.py in a shell script.

* More helpful error messages.

* Fixes.

* Wait for redis server to start up before configuring it.

* Allow seeding of deterministic object ID generation.

* Small change.
2016-12-21 18:53:12 -08:00
Robert Nishihara
c9c1b3e6af Change db_connect to allow different arguments from different processes. (#142)
* Allow db_connect to take a variable number of arguments.

* Fix tests.

* Fixes.

* Formatting.

* Fixes.

* Simplifications.

* Fix typo.
2016-12-20 20:21:35 -08:00
Philipp Moritz
0ca0864856 Use flatcc for serialization of IPC messages. (#140)
* added Phllipp's updates

* Switch to using flatbuffers for IPC.

* Various changes.

* convert remaining messages and cleanups

* fix

* fix function signatures

* fix valgrind errors

* clang-format

* final commit

* Fix valgrind test.
2016-12-20 14:46:25 -08:00
Stephanie Wang
6a73711888 Update the task table (#129)
* Update the task table

* Move updating task table out of scheduling algorithm.
2016-12-20 00:13:39 -08:00
Stephanie Wang
d729f9b7ea Object table remove (#139)
* Object table remove redis module

* Test case for object table remove redis module

* Client code for object_table_remove

* Delete object notifications in plasma

* Test for object deletion notifications

* Fix subscribe deletion test

* Address Robert's comments

* free hash table entry
2016-12-19 23:18:57 -08:00
Alexey Tumanov
cb3e6cde9e passing object info information with redis module (#138)
* adding object broadcast channel; published on each object table add

* publishing data size to the bcast channel

* bug fix: objectkey

* update object tests to test for data size: C + py

* remove debug

* clang format

* Minor changes.

* Fix error.

* merging with Robert's comments

* clang format for the object table test upgrade
2016-12-19 21:07:25 -08:00
Robert Nishihara
269f37e26f Implement object table notification subscriptions and switch to using Redis modules for object table. (#134)
* Implement RAY.OBJECT_TABLE_REQUEST_NOTIFICATIONS.

* Call object_table_request_notifications from plasma manager.

* Use Redis modules for object table.

* Cleaning up code.

* More checks.

* Formatting.

* Make object table tests pass.

* Formatting.

* Add prefix to the object notification channel name.

* Formatting.

* Fixes.

* Increase time in redismodule test.
2016-12-18 18:19:02 -08:00
Robert Nishihara
c89bf4e5bc Fix improper handling of NULL characters when opening Redis keys. (#136)
* Fix improper handling of NULL characters when opening Redis keys.

* Add test.
2016-12-18 13:06:28 -08:00
Robert Nishihara
a9e6a53360 Print helpful error message when libgcc error occurs. (#133) 2016-12-17 15:38:48 -08:00
Robert Nishihara
edf8d1ee9f Fix Python3 error in tests. (#135) 2016-12-17 12:42:37 -08:00
Stephanie Wang
e23661c375 Task table Redis module (#125)
* Task table redis module implementation

* Publish tasks and take in individual fields as args, not task object

* Scheduling state integer has width 1, error on illegal put

* Unit tests for task table and more documentation

* Task table subscribe, fix publish topics and address Philipp and Alexey's comments

* Helper function to create prefixed strings

* Factor out the table prefixes in the test cases
2016-12-16 14:40:44 -08:00