* Add actor handle ID to the task spec
* Local scheduler dispatches actor tasks according to a task counter per handle
* Fix python test
* Allow passing actor handles into tasks. Not completely working yet. Also this is very messy.
* Fixes, should be roughly working now.
* Refactor actor handle wrapper
* Fix __init__ tests
* Terminate actor when the original handle goes out of scope
* TODO and a couple test cases
* Make tests for unsupported cases
* Fix Python mode tests
* Linting.
* Cache actor definitions that occur before ray.init() is called.
* Fix export actor class
* Deterministically compute actor handle ID
* Fix __getattribute__
* Fix string encoding for python3
* doc
* Add comment and assertion.
* fix yaml bug
* add ext agent
* gpus
* update
* tuning
* docs
* Sun Oct 15 21:09:25 PDT 2017
* lint
* update
* Sun Oct 15 22:39:55 PDT 2017
* Sun Oct 15 22:40:17 PDT 2017
* Sun Oct 15 22:43:06 PDT 2017
* Sun Oct 15 22:46:06 PDT 2017
* Sun Oct 15 22:46:21 PDT 2017
* Sun Oct 15 22:48:11 PDT 2017
* Sun Oct 15 22:48:44 PDT 2017
* Sun Oct 15 22:49:23 PDT 2017
* Sun Oct 15 22:50:21 PDT 2017
* Sun Oct 15 22:53:00 PDT 2017
* Sun Oct 15 22:53:34 PDT 2017
* Sun Oct 15 22:54:33 PDT 2017
* Sun Oct 15 22:54:50 PDT 2017
* Sun Oct 15 22:55:20 PDT 2017
* Sun Oct 15 22:56:56 PDT 2017
* Sun Oct 15 22:59:03 PDT 2017
* fix
* Update tune_mnist_ray.py
* remove script trial
* fix
* reorder
* fix ex
* py2 support
* upd
* comments
* comments
* cleanup readme
* fix trial
* annotate
* Update rllib.rst
* Worker reports error in previous task, actor task counter is incremented after task is successful
* Refactor actor task execution
- Return new task counter in GetTaskRequest
- Update worker state for actor tasks inside of the actor method
executor
* Manually invoked checkpoint method
* Scheduling for actor checkpoint methods
* Fix python bugs in checkpointing
* Return task success from worker to local scheduler instead of actor counter
* Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once
* Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks
* Make executor for temporary actor methods
* doc
* Set default argument for whether the previous task was a success
* Refactor actor method call
* Simplify checkpoint task submission
* lint
* fix philipp's comments
* Add missing line
* Make actor reconstruction tests run faster
* Unimportant whitespace.
* Unimportant whitespace.
* Update checkpoint method signature
* Documentation and handle exceptions during checkpoint save/resume
* Rename get_task message field to actor_checkpoint_failed
* Fix bug.
* Remove debugging check, redirect test output
* Release GPU resources as soon as an actor exits.
* Add a test.
* Store local_scheduler_id and driver_id in the worker object instead of the actor object.
* Fix bug in wait_for_pid_to_exit, add test for actor deletion.
* Fix actor garbage collection by breaking cyclic references
* Add test for calling actor method immediately after actor creation.
* Fix bug, must dispatch tasks when workers are killed.
* Fix python test
* Fix cyclic reference problem by creating ActorMethod objects on the fly.
* Try simply increasing the time allowed for many_drivers_test.py.
* WIP: removing OL, OI, TT on client exit; no saving yet.
* ray_redis_module.cc: update header comment.
* Cleanup: just the removal.
* Reformat via yapf: use pep8 style instead of google.
* Checkpoint addressing comments (partially)
* Add 'b' marker before strings (py3 compat)
* Add MonitorTest.
* Use `isort` to sort imports.
* Remove some loggings
* Fix flake8 noqa marker runtest.py
* Try to separate tests out to monitor_test.py
* Rework cleanup algorithm: correct logic
* Extend tests to cover multi-shard cases
* Add some small comments and formatting changes.
* Local scheduler sends a null heartbeat to global scheduler to notify death
* Add whitespace.
* Speed up component failures test
* Free local scheduler state upon plasma manager disconnection
* Revert Python actor reconstruction
* Actor reconstruction using object lineage
* Add dummy arguments and return values for actor tasks
* Pin dummy outputs for actor tasks
* Skip checkpointing test for now
* TODOs
* minor edits
* Generate dummy object dependencies in Python, not C
* Fix linting.
* Move actor counter and dummy objects inside of the actor handle
* Refactor Worker._process_task, suppress exception propagation for
sequential actor tasks
* Remove race between local scheduler disconnecting and global scheduler
assigning a task
* Fix number of workers started in component failures test
* Fix race between global scheduler retrying a task assignment and monitor
cleaning up task table. The global scheduler should only retry the task
assignment if the local scheduler is still alive.
* Clean up task_table_update callback if failure
* Look up current local scheduler mapping when retrying actor task submission
* Log warning if no subscribers received a task table update
* Clean up database handle memory in local scheduler
* make information available for GAE
* buggy version of GAE estimator
* fix
* add more logging and reweight losses
* fix logging
* fix loss
* adapt advantage calculation
* update gae
* standardize returns
* don't normalize td lambda ret
* fix
* don't standardize advantages
* do standardization earlier
* different standardization
* initializer
* drop into the debugger
* fix tensorflow broadcasting bug
* vf clipping
* don't standardize tdlambdaret
* different standardization
* use huber loss for value function
* refactor -- first half
* it runs
* fix
* update
* documentation
* linting and tests
* fix linting
* naming
* fix
* linting
* fix
* remove prefix madness
* fixes
* fix
* add value function example
* fix linting
* remove newline
* adding support for the user-interpretable label(UIR)
* more plumbing for num_uirs further upstream; set to infty when specified on cmd line
* pass default num_uirs for actors; update GlobalStateAPI
* support num_uirs in ray.init()
* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting
* global scheduler test updated
* Fix bug introduced by rebase.
* Rename UIR -> CustomResource and add test.
* Small changes and use constexpr instead of macros.
* Linting and some renaming.
* Reorder some code.
* Remove cpus_in_use and fix bug.
* Add another test and make a small change.
* Rephrase documentation about feature stability.
* Initial testing of checkpointing functions.
* Save checkpoints in Redis.
* Pipe checkpoint_interval through remote decorator.
* Add a test.
* Small cleanups.
* Submit dummy tasks when reconstructing tasks before the most recent tasks so that we don't end up reconstructing the arguments for those tasks.
* Remove old checkpoints to save space.
* Fix linting.
* Reconstruct actor state when local schedulers fail.
* Simplify construction of arguments to pass into default_worker.py from local scheduler.
* Remove deprecated ray.actor.
* Simplify actor reconstruction method.
* Fix linting.
* Small fixes.
* Rebase Ray on top of Plasma in Apache Arrow
* add thirdparty building scripts
* use rebased arrow
* fix
* fix build
* fix python visibility
* comment out C tests for now
* fix multithreading
* fix
* reduce logging
* fix plasma manager multithreading
* make sure old and new object IDs can coexist peacefully
* more rebasing
* update
* fixes
* fix
* install pyarrow
* install cython
* fix
* install newer cmake
* fix
* rebase on top of latest arrow
* getting runtest.py run locally (needed to comment out a test for that to work)
* work on plasma tests
* more fixes
* fix local scheduler tests
* fix global scheduler test
* more fixes
* fix python 3 bytes vs string
* fix manager tests valgrind
* fix documentation building
* fix linting
* fix c++ linting
* fix linting
* add tests back in
* Install without sudo.
* Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma.
* Install pkg-config
* Link -lpthread, note that find_package(Threads) doesn't seem to work reliably.
* Comment in testGPUIDs in runtest.py.
* Set PKG_CONFIG_PATH when building pyarrow.
* Pull apache/arrow and not pcmoritz/arrow.
* Fix installation in docker image.
* adapt to changes of the plasma api
* Fix installation of pyarrow module.
* Fix linting.
* Use correct python executable to build pyarrow.
* Make local scheduler start workers using the same version of Python that was used to start the local scheduler.
* Use current version of python to start new processes instead of hardcoded python executable.
* Fix linting.