* Add actor handle ID to the task spec
* Local scheduler dispatches actor tasks according to a task counter per handle
* Fix python test
* Allow passing actor handles into tasks. Not completely working yet. Also this is very messy.
* Fixes, should be roughly working now.
* Refactor actor handle wrapper
* Fix __init__ tests
* Terminate actor when the original handle goes out of scope
* TODO and a couple test cases
* Make tests for unsupported cases
* Fix Python mode tests
* Linting.
* Cache actor definitions that occur before ray.init() is called.
* Fix export actor class
* Deterministically compute actor handle ID
* Fix __getattribute__
* Fix string encoding for python3
* doc
* Add comment and assertion.
* Compile global scheduler with -Werror -Wall.
* Compile plasma manager with -Werror -Wall.
* Compile local scheduler with -Werror -Wall.
* Compile common code with -Werror -Wall.
* Signed/unsigned comparisons.
* More signed/unsigned fixes.
* More signed/unsigned fixes and added extern keyword.
* Fix linting.
* Don't check strict-aliasing because Python.h doesn't pass.
* Worker reports error in previous task, actor task counter is incremented after task is successful
* Refactor actor task execution
- Return new task counter in GetTaskRequest
- Update worker state for actor tasks inside of the actor method
executor
* Manually invoked checkpoint method
* Scheduling for actor checkpoint methods
* Fix python bugs in checkpointing
* Return task success from worker to local scheduler instead of actor counter
* Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once
* Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks
* Make executor for temporary actor methods
* doc
* Set default argument for whether the previous task was a success
* Refactor actor method call
* Simplify checkpoint task submission
* lint
* fix philipp's comments
* Add missing line
* Make actor reconstruction tests run faster
* Unimportant whitespace.
* Unimportant whitespace.
* Update checkpoint method signature
* Documentation and handle exceptions during checkpoint save/resume
* Rename get_task message field to actor_checkpoint_failed
* Fix bug.
* Remove debugging check, redirect test output
* When a task is passed to the global scheduler, if it is not received, then try again.
* Call give_task_to_global_scheduler directly (same with local).
* Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past.
* Fix linting.
* Refactoring to make manager vectors into std::vector.
* Fix linting.
* Fixes.
* Comment out local scheduler valgrind test.
* Fix free/delete error.
* More free -> delete errors
* One more free -> delete and also clean up callback state in plasma manager.
* Add set -x to run_valgrind scripts.
* Fix valgrind error in CreateLocalSchedulerInfoMessage.
* WIP: removing OL, OI, TT on client exit; no saving yet.
* ray_redis_module.cc: update header comment.
* Cleanup: just the removal.
* Reformat via yapf: use pep8 style instead of google.
* Checkpoint addressing comments (partially)
* Add 'b' marker before strings (py3 compat)
* Add MonitorTest.
* Use `isort` to sort imports.
* Remove some loggings
* Fix flake8 noqa marker runtest.py
* Try to separate tests out to monitor_test.py
* Rework cleanup algorithm: correct logic
* Extend tests to cover multi-shard cases
* Add some small comments and formatting changes.
* Local scheduler sends a null heartbeat to global scheduler to notify death
* Add whitespace.
* Speed up component failures test
* Free local scheduler state upon plasma manager disconnection
* Remove race between local scheduler disconnecting and global scheduler
assigning a task
* Fix number of workers started in component failures test
* Fix race between global scheduler retrying a task assignment and monitor
cleaning up task table. The global scheduler should only retry the task
assignment if the local scheduler is still alive.
* Clean up task_table_update callback if failure
* Look up current local scheduler mapping when retrying actor task submission
* Log warning if no subscribers received a task table update
* Clean up database handle memory in local scheduler
* Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray.
* Add cython to setup.py install_requires.
* Revert custom code for finding python in cmake.
* Correctly find arrow on CentOS.
* In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS.
* Fix typo.
* Do not use boost shared libraries when building arrow.
* Add six to the setup.py install_requires because it is needed by pyarrow.
* Don't link numbuf against boost_system and boost_filesystem.
* Compile boost when we are on Linux.
* Make numbuf find the correct boost libraries.
* Only use find_package Boost on Linux, suppress output when building boost.
* Changes to wheel building scripts, install cython in mac script.
* Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow.
* Clean up build_flatbuffers.sh and build_boost.sh scripts a little.
* Install cython when building linux wheel.
* adding support for the user-interpretable label(UIR)
* more plumbing for num_uirs further upstream; set to infty when specified on cmd line
* pass default num_uirs for actors; update GlobalStateAPI
* support num_uirs in ray.init()
* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting
* global scheduler test updated
* Fix bug introduced by rebase.
* Rename UIR -> CustomResource and add test.
* Small changes and use constexpr instead of macros.
* Linting and some renaming.
* Reorder some code.
* Remove cpus_in_use and fix bug.
* Add another test and make a small change.
* Rephrase documentation about feature stability.
* Reconstruct actor state when local schedulers fail.
* Simplify construction of arguments to pass into default_worker.py from local scheduler.
* Remove deprecated ray.actor.
* Simplify actor reconstruction method.
* Fix linting.
* Small fixes.
* Rebase Ray on top of Plasma in Apache Arrow
* add thirdparty building scripts
* use rebased arrow
* fix
* fix build
* fix python visibility
* comment out C tests for now
* fix multithreading
* fix
* reduce logging
* fix plasma manager multithreading
* make sure old and new object IDs can coexist peacefully
* more rebasing
* update
* fixes
* fix
* install pyarrow
* install cython
* fix
* install newer cmake
* fix
* rebase on top of latest arrow
* getting runtest.py run locally (needed to comment out a test for that to work)
* work on plasma tests
* more fixes
* fix local scheduler tests
* fix global scheduler test
* more fixes
* fix python 3 bytes vs string
* fix manager tests valgrind
* fix documentation building
* fix linting
* fix c++ linting
* fix linting
* add tests back in
* Install without sudo.
* Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma.
* Install pkg-config
* Link -lpthread, note that find_package(Threads) doesn't seem to work reliably.
* Comment in testGPUIDs in runtest.py.
* Set PKG_CONFIG_PATH when building pyarrow.
* Pull apache/arrow and not pcmoritz/arrow.
* Fix installation in docker image.
* adapt to changes of the plasma api
* Fix installation of pyarrow module.
* Fix linting.
* Use correct python executable to build pyarrow.
* Replace a local scheduler ut_array with a std::vector.
* Replace vector of sizes in local scheduler with std::pair.
* Remove utarray include.
* Replace utarray with std::vector for reading local scheduler input messages.
* Remove more UT data structures.
* Remove UT includes.
* Fix linting.
* Include stdlib.h to find size_t.
* Remove includes of stdbool.h.
* Replace std::pair with TaskQueueEntry.
* Fix redis tests.
* Reinstate tests.
* added log_table function and a test
* fixed log_files and added task_profiles
* fixed formatting
* fixed linting errors
* fixes
* removed file
* more fixes
* hopefully fixed
* Small changes.
* Fix linting.
* Fix bug in log monitor.
* Small changes.
* Fix bug in travis.
* Including data_size and hash in the ResultTableReply.
* Included data_size and hash info in object_table.
* Fixed bugs in ray_redis_module.cc.
* Removing commented out code.
* Fixes
* Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put.
* Changed it so that data_size is set correctly.
* Removed iostream import.
* Included a check to ensure that the Redis string to long long conversion was successful.
* Included separate data_size and hash null checks.
* Fixed bug.
* Made linting changes.
* Another linting error.
* Slight simplication.
* Log fatal error if plasma manager or local scheduler take too long to send heartbeat.
* Fix linting.
* Use int64_t for milliseconds since unix epoch.
* Implement sharding in the Ray core
* Single node Python modifications to do sharding
* Do the sharding in redis.cc
* Pipe num_redis_shards through start_ray.py and worker.py.
* Use multiple redis shards in multinode tests.
* first steps for sharding ray.global_state
* Fix problem in multinode docker test.
* fix runtest.py
* fix some tests
* fix redis shard startup
* fix redis sharding
* fix
* fix bug introduced by the map-iterator being consumed
* fix sharding bug
* shard event table
* update number of Redis clients to be 64K
* Fix object table tests by flushing shards in between unit tests
* Fix local scheduler tests
* Documentation
* Register shard locations in the primary shard
* Add plasma unit tests back to build
* lint
* lint and fix build
* Fix
* Address Robert's comments
* Refactor start_ray_processes to start Redis shard
* lint
* Fix global scheduler python tests
* Fix redis module test
* Fix plasma test
* Fix component failure test
* Fix local scheduler test
* Fix runtest.py
* Fix global scheduler test for python3
* Fix task_table_test_and_update bug, from actor task table submission race
* Fix jenkins tests.
* Retry Redis shard connections
* Fix test cases
* Convert database clients to DBClient struct
* Fix race condition when subscribing to db client table
* Remove unused lines, add APITest for sharded Ray
* Fix
* Fix memory leak
* Suppress ReconstructionTests output
* Suppress output for APITestSharded
* Reissue task table add/update commands if initial command does not publish to any subscribers.
* fix
* Fix linting.
* fix tests
* fix linting
* fix python test
* fix linting
* copy task specifications put into the actor task cache so it won't get overwritten when the scheduler receives the next task
* cleanup
* cleanup and fix
* linting
* fix jenkins test
* fix linting
* Clean up state when drivers exit.
* Remove unnecessary field in ActorMapEntry struct.
* Have monitor release GPU resources in Redis when driver exits.
* Enable multiple drivers in multi-node tests and test driver cleanup.
* Make redis GPU allocation a redis transaction and small cleanups.
* Fix multi-node test.
* Small cleanups.
* Make global scheduler take node_ip_address so it appears in the right place in the client table.
* Cleanups.
* Fix linting and cleanups in local scheduler.
* Fix removed_driver_test.
* Fix bug related to vector -> list.
* Fix linting.
* Cleanup.
* Fix multi node tests.
* Fix jenkins tests.
* Add another multi node test with many drivers.
* Fix linting.
* Make the actor creation notification a flatbuffer message.
* Revert "Make the actor creation notification a flatbuffer message."
This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.
* Add comment explaining flatbuffer problems.
* plasma manager perf: speedup wait with a wait request object map
* removing duplicate == operator in plasma store
* fix serialization test
* code cleanup
* minor cleanup
* factoring out uniqueid hash and equality operators into common
* plasma manager: c++ify the WaitRequest struct
* plasma manager: get rid of the initial object request malloc
* cleanup
* linting
* cleanups and fix compiler warnings
* compiler warnings and linting
* Ignore deleted clients when reading address info from Redis
* Remove self from db_client table when exiting cleanly
* Fix valgrind test
* Do not call plasma_perform_release when disconnecting
* Switch to using C++ lists for task queues
* Init and free methods for TaskQueueEntry
* Switch from utarray to c++ vector for TaskQueueEntry
* Get rid of some pointers
* Back to O(1) deletion from waiting_task_queue
* Fix comments
* Cut code
* Non const iterators
* Fix Alexey's comments
* Fix worker blocked bug
* tmp
* Push an error to the driver on ray.put for non-driver tasks
* Fix result table tests
* Fix test, logging
* Address comments
* Fix suppression bug
* Fix redis module test
* Edit error message
* Get values in chunks during reconstruction
* Test case for driver ray.put errors
* Error for evicting ray.put objects from the driver
* Fix tests
* Reduce verbosity
* Documentation