* attribute-based heterogeneity-awareness in global scheduler and photon
* minor post-rebase fix
* photon: enforce dynamic capacity constraint on task dispatch
* globalsched: cap the number of times we try to schedule a task in round robin
* propagating ability to specify resource capacity to ray.init
* adding resources to remote function export and fetch/register
* globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat)
* Add some integration tests.
* globalsched: cleanup + factor out constraint checking
* lots of style
* task_spec_required_resource: global refactor
* clang format
* clang format + comment update in photon
* clang format photon comment
* valgrind
* reduce verbosity for Travis
* Add test for scheduler load balancing.
* addressing comments
* refactoring global scheduler algorithm
* Minor cleanups.
* Linting.
* Fix array_test.py and linting.
* valgrind fix for photon tests
* Attempt to fix stress tests.
* fix hashmap free
* fix hashmap free comment
* memset photon resource vectors to 0 in case they get used before the first heartbeat
* More whitespace changes.
* Undo whitespace error I introduced.
* Added example for compute grads in ray
* Added formatting
* Removed need for placeholders in apply gradient
* Streamlined examples
* Fixed docs
* Added formatting
* Removed old references
* Simplified code some
* Addressed comments
* Changes to first code block
* Added test for training and updated code snippets
* Formatting
* Removed mean
* Removed all mention of mean
* Added comments
* Added comments
* First pass at reconstruction in the worker
Modify reconstruction stress testing to start Plasma service before rest of Ray cluster
TODO about reconstructing ray.puts
Fix ray.put error for double creates
Distinguish between empty entry and no entry in object table
Fix test case
Fix Python test
Fix tests
* Only call reconstruct on objects we have not yet received
* Address review comments
* Fix reconstruction for Python3
* remove unused code
* Address Robert's comments, stress tests are crashing
* Test and update the task's scheduling state to suppress duplicate
reconstruction requests.
* Split result table into two lookups, one for task ID and the other as a
test-and-set for the task state
* Fix object table tests
* Fix redis module result_table_lookup test case
* Multinode reconstruction tests
* Fix python3 test case
* rename
* Use new start_redis
* Remove unused code
* lint
* indent
* Address Robert's comments
* Use start_redis from ray.services in state table tests
* Remove unnecessary memset
* Added test for retriving variables from an optimizer
* Added comments to test
* Addressed comments
* Fixed travis bug
* Added fix to circular controls
* Added set for explored operations and duplicate prefix stripping
* Removed embeded ipython
* Removed prefix, use seperate graph for each network
* Removed redundant imports
* Addressed comments and added separate graph to initializer
* fix typos
* get rid of prefix in documentation
* Change plasma_get to take a timeout and an array of object IDs.
* Address comments.
* Bug fix related to computing object hashes.
* Add test.
* Fix file descriptor leak.
* Fix valgrind.
* Formatting.
* Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get.
* small fixes
* Prevent plasma store and manager from dying when a worker dies.
* Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.
* Remove start_ray_local from ray.init and change default number of workers to 10.
* Remove alexnet example.
* Move array methods to experimental.
* Remove TRPO example.
* Remove old files.
* Compile plasma when we build numbuf.
* Address comments.
* Added helper class for getting tf variables from loss function
* Updated usage and documentation
* Removed try-catches
* Added futures
* Added documentation
* fixes and tests
* more tests
* install tensorflow in travis
* Send message from plasma client to get plasma store capacity.
* Release objects from plasma client if they are too large.
* Use doubly-linked list instead of ring buffer for plasma client release history.
* Address comments.
* Fix problem with slicing PlasmaBuffer objects.
* Fix crash in plasma manager during transfer.
* Formatting.
* Make plasma client cache larger and make caching test not throw exceptions on Travis.
* Suppress Redis warnings and remove some global scheduler logging.
* Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.
* Duplicate wait1 implementation and seperate out wait datastructures.
* Address Philipp's comments.
* Temporarily address test failure problem by increasing timeout and reducing load in tests.
* Update stress tests to include distributed wait.
* Fetch missing dependencies from local scheduler.
* Factor out global scheduler policy state.
* Use object_table_subscribe instead of object_table_lookup.
* Fix bug in which timer was being created twice for a single fetch request.
* Free old manager vector.
* Give more informative error message when we do not know how to serialize a class.
* Check that passing arguments to remote functions and getting them does not change their values.
* fix serialization bug
* fix tests for common module
* Formatting.
* Bug fix in init_pickle_module signature.
* Use pickle with HIGHEST_PROTOCOL.
* Initial scheduler commit
* global scheduler
* add global scheduler
* Implement global scheduler skeleton.
* Formatting.
* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.
* Fail if there are no local schedulers when the global scheduler receives a task.
* Initialize uninitialized value and formatting fix.
* Generalize local scheduler table to db client table.
* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.
* Queue task specs in the local scheduler instead of tasks.
* Simple global scheduler tests, including valgrind.
* Factor out functions for starting processes.
* Fixes.
* Fix bug in which worker import counters were treated incorrectly.
* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.
* Only define reusable variables locally after ray.init has been called.
* Remove flaky reference counting tests. It's not clear that these tests make sense.
* Make numbuf pip install verbose.
* Export cached reusable variables before cached remote functions.
* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.
* Acquire locks using 'with' statements.
* Fix possible test failure.
* Try to start Redis multiple times with different random ports if the original attempt failed.
* Fix test in which we redefine a remote function.
* Update worker code and services code to use plasma and the local scheduler.
* Cleanups.
* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.
* Fix bug in install-dependencies.sh.
* Lengthen timeout in failure_test.py.
* Cleanups.
* Cleanup services.start_ray_local.
* Clean up random name generation.
* Cleanups.
* Properly reset reusable variables on the driver when remote functions are run locally on the driver.
* Cache functions to run on all workers that occur before ray.init is called.
* Re-implemented select, changed name to wait
* Changed tests for select to tests for wait
* Updated the hyperopt example to match wait
* Small fixes and improve example readme.
* Make tests pass.