* Implement Node class and move most of services.py into it.
* Wait for nodes as they are added to the cluster.
* Fix Redis authentication bug.
* Fix bug in client table ordering.
* Address comments.
* Kill raylet before plasma store in test.
* Minor
* Convert UniqueID::nil() to a constructor
* Cleanup actor handle pickling code
* Add new actor handles to the task spec
* Pass in new actor handles
* Add new handles to the actor registration
* Regression test for actor handle forking and GC
* lint and doc
* Handle pickled actor handles in the backend and some refactoring
* Add regression test for dummy object GC and pickled actor handles
* Check for duplicate actor tasks on submission
* Regression test for forking twice, fix failed named actor leak
* Fix bug for forking twice
* lint
* Revert "Fix bug for forking twice"
This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac.
* Add new actor handles when task is assigned, not finished
* Remove comment
* remove UniqueID()
* Updates
* update
* fix
* fix java
* fixes
* fix
* Add a flag for whether an object has been created before
* Add regression test
* doc
* Share object directory between object and node managers
* Treat evicted actor tasks as failed
* minor
* Check return value
* Fix bug where object locations weren't getting updated on client death
* Fix mac build
* Use RayTaskError
* Broadcast actor death, clean up dummy objects
* Reduce logging and clean up state when failing a task
* lint
* Make actor failure test nicer, reduce node timeout
* Increase timeout to 10s
* Skip eviction reconstruction tests
* Add stress test for many actors to one
* Fix test by shortening it.
* lower number of processes in stress test
* Skip slow test
* Convert multi_node_test.py to pytest.
* Convert array_test.py to pytest.
* Convert failure_test.py to pytest.
* Convert microbenchmarks to pytest.
* Convert component_failures_test.py to pytest and some minor quotes changes.
* Convert tensorflow_test.py to pytest.
* Convert actor_test.py to pytest.
* Fix.
* Fix
* Limit number of concurrent workers started by hardware concurrency.
* Check if std:🧵:hardware_concurrency() returns 0.
* Pass in max concurrency from Python.
* Fix Java call to startRaylet.
* Fix typo
* Remove unnecessary cast.
* Fix linting.
* Cleanups on Java side.
* Comment back in actor test.
* Require maximum_startup_concurrency to be at least 1.
* Fix linting and test.
* Improve documentation.
* Fix typo.
## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing heterogeneity-aware late-binding/work-stealing.
* Print warning when defining very large remote function or actor.
* Add weak test.
* Check that warnings appear in test.
* Make wait_for_errors actually fail in failure_test.py.
* Use constants for error types.
* Fix
* Add flake8 to Travis
* Add flake8-comprehensions
[flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that
checks for useless constructions.
* Use generators instead of lists where appropriate
A lot of the builtins can take in generators instead of lists.
This commit applies `flake8-comprehensions` to find them.
* Fix lint error
* Fix some string formatting
The rest can be fixed in another PR
* Fix compound literals syntax
This should probably be merged after #1963.
* dict() -> {}
* Use dict literal syntax
dict(...) -> {...}
* Rewrite nested dicts
* Fix hanging indent
* Add missing import
* Add missing quote
* fmt
* Add missing whitespace
* rm duplicate pip install
This is already installed in another file.
* Fix indent
* move `merge_dicts` into utils
* Bring up to date with `master`
* Add automatic syntax upgrade
* rm pyupgrade
In case users want to still use it on their own, the upgrade-syn.sh script was
left in the `.travis` dir.
* Run xray tests in travis.
* Comment out TaskTests.testSubmittingManyTasks.
* Comment out failing tests.
* Comment out hanging test.
* Linting
* Comment out failing test.
* Comment out failing test.
* Ignore test_dataframe.py for now.
* Comment out testDriverExitingQuickly.
* Make ActorHandles pickleable, also make proper ActorHandle and ActorClass classes.
* Fix bug.
* Fix actor test bug.
* Update __ray_terminate__ usage.
* Fix most linting, add documentation, and small cleanups.
* Handle forking and pickling differently for actor handles. Fix linting.
* Fixes for named actors via pickling.
* Generate actor handle IDs deterministically in the pickling case.
* Use set/dict literal syntax
Ran code through [pyupgrade](https://github.com/asottile/pyupgrade). This is
supported in every Python version 2.7+.
* Drop unnecessary string format specification
No need to specify 0,1.. if paramters are passed in order.
* Revert "Drop unnecessary string format specification"
This reverts commit efa5ec85d30ff69f34e5ed93e31343fea7647bcb.
* Undo changes to cloudpickle
Drop use of set literal until cloudpickle uses it.
* Reformat code with YAPF
We need to set up a git pre-push hook to automatically run this stuff.
* Treat actor creation like a regular task.
* Small cleanups.
* Change semantics of actor resource handling.
* Bug fix.
* Minor linting
* Bug fix
* Fix jenkins test.
* Fix actor tests
* Some cleanups
* Bug fix
* Fix bug.
* Remove cached actor tasks when a driver is removed.
* Add more info to taskspec in global state API.
* Fix cyclic import bug in tune.
* Fix
* Fix linting.
* Fix linting.
* Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs.
* Bug fix.
* Add test for 0 CPU case
* Fix linting
* Address comments.
* Fix typos and add comment.
* Add assertion and fix test.
* Expose calls to get and set the actor frontier
* Remove fields used for old checkpointing prototype, change actor_checkpoint_failed -> succeeded
* Prototype for actor checkpointing
* Filter out duplicate tasks on the local scheduler
* Clean up some of the Python checkpointing code
* More cleanups
* Documentation
* cleanup and fix unit test
* Allow remote checkpoint calls through actor handle
* Check whether object is local before reconstructing
* Enable checkpointing for distributed actor handles, refactor tests
* Fix local scheduler tests
* lint
* Address comments
* lint
* Skip tests that fail on new GCS
* style
* Don't put same object twice when setting the actor frontier
* Address Philipp's comments, cleaner fbs naming
* Add failing unit test for nondeterministic reconstruction
* Retry scheduling actor tasks if reassigned to local scheduler
* Update execution edges asynchronously upon dispatch for nondeterministic reconstruction
* Fix bug for updating checkpoint task execution dependencies
* Update comments for deterministic reconstruction
* cleanup
* Add (and skip) failing test case for nondeterministic reconstruction
* Suppress test output