* Add num_cpus and num_gpus to actor decorator.
* Assign GPU IDs to actors.
* Add additional actor test.
* Remove duplicated line.
* Factor out local scheduler selection method.
* Add test and simplify local scheduler selection.
* First pass at a policy to solve deadlock
* Address Robert's comments
* stress test
* unit test
* Fix test cases
* Fix test for python3
* add more logging
* White space.
* Implement actor field for tasks
* Implement actor management in local scheduler.
* initial python frontend for actors
* import actors on worker
* IPython code completion and tests
* prepare creating actors through local schedulers
* add actor id to PyTask
* submit actor calls to local scheduler
* starting to integrate
* simple fix
* Fixes from rebasing.
* more work on python actors
* Improve local scheduler actor handlers.
* Pass actor ID to local scheduler when connecting a client.
* first working version of actors
* fixing actors
* fix creating two copies of the same actor
* fix actors
* remove sleep
* get rid of export synchronization
* update
* insert actor methods into the queue in the right order
* remove print statements
* make it compile again after rebase
* Minor updates.
* fix python actor ids
* Pass actor_id to start_worker.
* add test
* Minor changes.
* Update actor tests.
* Temporary plan for import counter.
* Temporarily fix import counters.
* Fix some tests.
* Fixes.
* Make actor creation non-blocking.
* Fix test?
* Fix actors on Python 2.
* fix rare case.
* Fix python 2 test.
* More tests.
* Small fixes.
* Linting.
* Revert tensorflow version to 0.12.0 temporarily.
* Small fix.
* Enhance inheritance test.
* Start and clean up workers from the local scheduler
Ability to kill workers in photon scheduler
Test for old method of starting workers
Common codepath for killing workers
Common codepath for killing workers
Photon test case for starting and killing workers
fix build
Fix component failure test
Register a worker's pid as part of initial connection
Address comments and revert photon_connect
Set PATH during travis install
Fix
* Fix photon test case to accept clients on plasma manager fd
* attribute-based heterogeneity-awareness in global scheduler and photon
* minor post-rebase fix
* photon: enforce dynamic capacity constraint on task dispatch
* globalsched: cap the number of times we try to schedule a task in round robin
* propagating ability to specify resource capacity to ray.init
* adding resources to remote function export and fetch/register
* globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat)
* Add some integration tests.
* globalsched: cleanup + factor out constraint checking
* lots of style
* task_spec_required_resource: global refactor
* clang format
* clang format + comment update in photon
* clang format photon comment
* valgrind
* reduce verbosity for Travis
* Add test for scheduler load balancing.
* addressing comments
* refactoring global scheduler algorithm
* Minor cleanups.
* Linting.
* Fix array_test.py and linting.
* valgrind fix for photon tests
* Attempt to fix stress tests.
* fix hashmap free
* fix hashmap free comment
* memset photon resource vectors to 0 in case they get used before the first heartbeat
* More whitespace changes.
* Undo whitespace error I introduced.
* Refactor local scheduler to remove worker indices.
* Change scheduling state enum to int in all function signatures.
* Bug fix, don't use pointers into a resizable array.
* Remove total_num_workers.
* Fix tests.
* First pass at reconstruction in the worker
Modify reconstruction stress testing to start Plasma service before rest of Ray cluster
TODO about reconstructing ray.puts
Fix ray.put error for double creates
Distinguish between empty entry and no entry in object table
Fix test case
Fix Python test
Fix tests
* Only call reconstruct on objects we have not yet received
* Address review comments
* Fix reconstruction for Python3
* remove unused code
* Address Robert's comments, stress tests are crashing
* Test and update the task's scheduling state to suppress duplicate
reconstruction requests.
* Split result table into two lookups, one for task ID and the other as a
test-and-set for the task state
* Fix object table tests
* Fix redis module result_table_lookup test case
* Multinode reconstruction tests
* Fix python3 test case
* rename
* Use new start_redis
* Remove unused code
* lint
* indent
* Address Robert's comments
* Use start_redis from ray.services in state table tests
* Remove unnecessary memset
* Provide functionality for local scheduler to start new workers.
* Pass full command for starting new worker in to local scheduler.
* Separate out configuration state of local scheduler.
* Optimizations:
- Track mapping of missing object to dependent tasks to avoid iterating over task queue
- Perform all fetch requests for missing objects using the same timer
* Fix bug and add regression test
* Record task dependencies and active fetch requests in the same hash table
* fix typo
* Fix memory leak and add test cases for scheduling when dependencies are evicted
* Fix python3 test case
* Minor details.
* Split local scheduler task queue into waiting and dispatch queue
* Fix memory leak
* Add a new task scheduling status for when a task has been queued locally
* Fix global scheduler test case and add task status doc
* Documentation
* Address Philipp's comments
* Move tasks back to the waiting queue if their dependencies become unavailable
* Update existing task table entries instead of overwriting
* Prevent plasma store and manager from dying when a worker dies.
* Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.
* Object notification test for Photon, and turn on valgrind for Photon C tests
* Test object notification handler in the plasma manager
* Fix hanging test case
* Send message from plasma client to get plasma store capacity.
* Release objects from plasma client if they are too large.
* Use doubly-linked list instead of ring buffer for plasma client release history.
* Address comments.
* Fix problem with slicing PlasmaBuffer objects.
* Fix crash in plasma manager during transfer.
* Formatting.
* Make plasma client cache larger and make caching test not throw exceptions on Travis.
* Switch to using redis modules for task table.
* Switch to using redis modules for the task table.
* Fix some tests.
* Fix naming and remove code duplication.
* Remove duplication in redis modules and add more cleanups.
* Address comments.
* Add function for driver to get address info from Redis.
* Use Redis address instead of Redis port.
* Configure Redis to run in unprotected mode.
* Add method for starting Ray processes on non-head node.
* Pass in correct node ip address to start_plasma_manager.
* Script for starting Ray processes.
* Handle the case where an object already exists in the store. Maybe this should also compare the object hashes.
* Have driver get info from Redis when start_ray_local=False.
* Fix.
* Script for killing ray processes.
* Catch some errors when the main_loop in a worker throws an exception.
* Allow redirecting stdout and stderr to /dev/null.
* Wrap start_ray.py in a shell script.
* More helpful error messages.
* Fixes.
* Wait for redis server to start up before configuring it.
* Allow seeding of deterministic object ID generation.
* Small change.
* passing plasma ip:port association with photon through redis to global scheduler
* Fix test.
* sanity-checking aux_address inside db_connect_extended
* clang format
* fix photon tests
* clang format photon tests
* Object reconstruction in Photon and C test cases for Photon
* Fix hanging test case on mac
* Remove unnecessary event from photon tests
* make photon_disconnect not leak file descriptors
* fix some of the memory errors
* Fix valgrind
* lint
* Address Robert's comments and add test case for object reconstruction suppression
* Remove OWNER
* change plasma object notifications to carry a struct of information
* factoring out object_info for general use by several Ray components
* fixing a bug in python test
* addressing comments
* handling Robert's comments
* clang format
* Fix valgrind.
* Add Python and Redis submodules, and remove old third-party modules
* Update VS projects (WARNING: references files that do not exist yet)
* Update code & add shims for APIs except AF_UNIX/{send,recv}msg()
* Minor style changes.
* Use sizeof(field) instead of sizeof(type) and other fixes.
* Fix formatting.
* Bug fix.
* Zero-initialize structs. There are many more instances of these that I haven't changed yet.
* Bug fix.
* Revert from atexit to signaling to fix valgrind tests.
* Address Philipp's comments.
* Initial scheduler commit
* global scheduler
* add global scheduler
* Implement global scheduler skeleton.
* Formatting.
* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.
* Fail if there are no local schedulers when the global scheduler receives a task.
* Initialize uninitialized value and formatting fix.
* Generalize local scheduler table to db client table.
* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.
* Queue task specs in the local scheduler instead of tasks.
* Simple global scheduler tests, including valgrind.
* Factor out functions for starting processes.
* Fixes.
* Merge task table and task log
* Fix test in db tests
* Address Robert's comments and some better error checking
* Add a LOG_FATAL that exits the program
* Update worker code and services code to use plasma and the local scheduler.
* Cleanups.
* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.
* Fix bug in install-dependencies.sh.
* Lengthen timeout in failure_test.py.
* Cleanups.
* Cleanup services.start_ray_local.
* Clean up random name generation.
* Cleanups.
* Fix socket bind collisions in manager_tests
* bind manager sockets before connecting to the store
* fix memory leak in tests
* fix valgrind early process termination
* fix bind/listen/subscribe race condition
* fix photon
* fix other tests
* make it that all of common is tested
* fix clang-format
* fix
* Remove hiredis submodule.
* Squashed 'src/common/thirdparty/hiredis/' content from commit acd1966
git-subtree-dir: src/common/thirdparty/hiredis
git-subtree-split: acd1966bf7f5e1be74b426272635c672def78779
* Make Plasma tests pass.
* Make Photon tests pass.
* Compile and test with Travis.
* Deactive fetch test so that the tests pass.