* adding support for the user-interpretable label(UIR)
* more plumbing for num_uirs further upstream; set to infty when specified on cmd line
* pass default num_uirs for actors; update GlobalStateAPI
* support num_uirs in ray.init()
* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting
* global scheduler test updated
* Fix bug introduced by rebase.
* Rename UIR -> CustomResource and add test.
* Small changes and use constexpr instead of macros.
* Linting and some renaming.
* Reorder some code.
* Remove cpus_in_use and fix bug.
* Add another test and make a small change.
* Rephrase documentation about feature stability.
* 4 space indentation for actor.py.
* 4 space indentation for worker.py.
* 4 space indentation for more files.
* 4 space indentation for some test files.
* Check indentation in Travis.
* 4 space indentation for some rl files.
* Fix failure test.
* Fix multi_node_test.
* 4 space indentation for more files.
* 4 space indentation for remaining files.
* Fixes.
* Updated task_profiles function to avoid future repetitive parsing.
* Fix indentation.
* Fixed according to comments.
* Included updated test for task_profiles function.
* Simplify test.
* Fix indentation.
* Fix.
* implement restarting workers after certain number of task executions
* Clean up python code.
* Don't start new worker when an actor disconnects.
* Move wait_for_pid_to_exit to test_utils.py.
* Add test.
* Fix linting errors.
* Fix linting.
* Fix typo.
* Implement sharding in the Ray core
* Single node Python modifications to do sharding
* Do the sharding in redis.cc
* Pipe num_redis_shards through start_ray.py and worker.py.
* Use multiple redis shards in multinode tests.
* first steps for sharding ray.global_state
* Fix problem in multinode docker test.
* fix runtest.py
* fix some tests
* fix redis shard startup
* fix redis sharding
* fix
* fix bug introduced by the map-iterator being consumed
* fix sharding bug
* shard event table
* update number of Redis clients to be 64K
* Fix object table tests by flushing shards in between unit tests
* Fix local scheduler tests
* Documentation
* Register shard locations in the primary shard
* Add plasma unit tests back to build
* lint
* lint and fix build
* Fix
* Address Robert's comments
* Refactor start_ray_processes to start Redis shard
* lint
* Fix global scheduler python tests
* Fix redis module test
* Fix plasma test
* Fix component failure test
* Fix local scheduler test
* Fix runtest.py
* Fix global scheduler test for python3
* Fix task_table_test_and_update bug, from actor task table submission race
* Fix jenkins tests.
* Retry Redis shard connections
* Fix test cases
* Convert database clients to DBClient struct
* Fix race condition when subscribing to db client table
* Remove unused lines, add APITest for sharded Ray
* Fix
* Fix memory leak
* Suppress ReconstructionTests output
* Suppress output for APITestSharded
* Reissue task table add/update commands if initial command does not publish to any subscribers.
* fix
* Fix linting.
* fix tests
* fix linting
* fix python test
* fix linting
* Perform ray.register_class under the hood.
* Fix bug.
* Release worker lock when waiting for imports to arrive in get.
* Remove calls to register_class from examples and tests.
* Clear serialization state between tests.
* Fix bug and add test for multiple custom classes with same name.
* Fix failure test.
* Fix linting and cleanups to python code.
* Fixes to documentation.
* Implement recursion depth for recursively registering classes.
* Fix linting.
* Push warning to user if waiting for class for too long.
* Fix typos.
* Don't export FunctionToRun if pickling the function fails.
* Don't broadcast class definition when pickling class.
* Change local scheduler bookkeeping to use GPU IDs.
* Update actor test.
* Add tests for actors and tasks simultaneously using GPUs.
* Add additional task GPU ID test.
* Fix linting.
* Make redis GPU assignment ignore GPU IDs.
* Small fix.
* Serialize lambdas with pickle by default.
* Serialize sets with pickle by default.
* Serialize types with pickle by default.
* Small update to documentation.
* Update tests.
* Clean up state when drivers exit.
* Remove unnecessary field in ActorMapEntry struct.
* Have monitor release GPU resources in Redis when driver exits.
* Enable multiple drivers in multi-node tests and test driver cleanup.
* Make redis GPU allocation a redis transaction and small cleanups.
* Fix multi-node test.
* Small cleanups.
* Make global scheduler take node_ip_address so it appears in the right place in the client table.
* Cleanups.
* Fix linting and cleanups in local scheduler.
* Fix removed_driver_test.
* Fix bug related to vector -> list.
* Fix linting.
* Cleanup.
* Fix multi node tests.
* Fix jenkins tests.
* Add another multi node test with many drivers.
* Fix linting.
* Make the actor creation notification a flatbuffer message.
* Revert "Make the actor creation notification a flatbuffer message."
This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.
* Add comment explaining flatbuffer problems.
* First pass at a policy to solve deadlock
* Address Robert's comments
* stress test
* unit test
* Fix test cases
* Fix test for python3
* add more logging
* White space.
* Remove import counter and export counter.
* Provide isolation between drivers for remote functions.
* Add test for driver function isolation.
* Hash source code into function ID to reduce likelihood of collisions.
* Fix failure test example.
* Replace assertTrue with assertIn to improve failure messages in tests.
* Fix failure test.
* Implement actor field for tasks
* Implement actor management in local scheduler.
* initial python frontend for actors
* import actors on worker
* IPython code completion and tests
* prepare creating actors through local schedulers
* add actor id to PyTask
* submit actor calls to local scheduler
* starting to integrate
* simple fix
* Fixes from rebasing.
* more work on python actors
* Improve local scheduler actor handlers.
* Pass actor ID to local scheduler when connecting a client.
* first working version of actors
* fixing actors
* fix creating two copies of the same actor
* fix actors
* remove sleep
* get rid of export synchronization
* update
* insert actor methods into the queue in the right order
* remove print statements
* make it compile again after rebase
* Minor updates.
* fix python actor ids
* Pass actor_id to start_worker.
* add test
* Minor changes.
* Update actor tests.
* Temporary plan for import counter.
* Temporarily fix import counters.
* Fix some tests.
* Fixes.
* Make actor creation non-blocking.
* Fix test?
* Fix actors on Python 2.
* fix rare case.
* Fix python 2 test.
* More tests.
* Small fixes.
* Linting.
* Revert tensorflow version to 0.12.0 temporarily.
* Small fix.
* Enhance inheritance test.
* Start and clean up workers from the local scheduler
Ability to kill workers in photon scheduler
Test for old method of starting workers
Common codepath for killing workers
Common codepath for killing workers
Photon test case for starting and killing workers
fix build
Fix component failure test
Register a worker's pid as part of initial connection
Address comments and revert photon_connect
Set PATH during travis install
Fix
* Fix photon test case to accept clients on plasma manager fd
* attribute-based heterogeneity-awareness in global scheduler and photon
* minor post-rebase fix
* photon: enforce dynamic capacity constraint on task dispatch
* globalsched: cap the number of times we try to schedule a task in round robin
* propagating ability to specify resource capacity to ray.init
* adding resources to remote function export and fetch/register
* globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat)
* Add some integration tests.
* globalsched: cleanup + factor out constraint checking
* lots of style
* task_spec_required_resource: global refactor
* clang format
* clang format + comment update in photon
* clang format photon comment
* valgrind
* reduce verbosity for Travis
* Add test for scheduler load balancing.
* addressing comments
* refactoring global scheduler algorithm
* Minor cleanups.
* Linting.
* Fix array_test.py and linting.
* valgrind fix for photon tests
* Attempt to fix stress tests.
* fix hashmap free
* fix hashmap free comment
* memset photon resource vectors to 0 in case they get used before the first heartbeat
* More whitespace changes.
* Undo whitespace error I introduced.
* Change plasma_get to take a timeout and an array of object IDs.
* Address comments.
* Bug fix related to computing object hashes.
* Add test.
* Fix file descriptor leak.
* Fix valgrind.
* Formatting.
* Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get.
* small fixes
* Remove start_ray_local from ray.init and change default number of workers to 10.
* Remove alexnet example.
* Move array methods to experimental.
* Remove TRPO example.
* Remove old files.
* Compile plasma when we build numbuf.
* Address comments.
* Suppress Redis warnings and remove some global scheduler logging.
* Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.
* Give more informative error message when we do not know how to serialize a class.
* Check that passing arguments to remote functions and getting them does not change their values.
* fix serialization bug
* fix tests for common module
* Formatting.
* Bug fix in init_pickle_module signature.
* Use pickle with HIGHEST_PROTOCOL.
* Initial scheduler commit
* global scheduler
* add global scheduler
* Implement global scheduler skeleton.
* Formatting.
* Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler.
* Fail if there are no local schedulers when the global scheduler receives a task.
* Initialize uninitialized value and formatting fix.
* Generalize local scheduler table to db client table.
* Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not.
* Queue task specs in the local scheduler instead of tasks.
* Simple global scheduler tests, including valgrind.
* Factor out functions for starting processes.
* Fixes.
* Fix bug in which worker import counters were treated incorrectly.
* Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called.
* Only define reusable variables locally after ray.init has been called.
* Remove flaky reference counting tests. It's not clear that these tests make sense.
* Make numbuf pip install verbose.
* Export cached reusable variables before cached remote functions.
* Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock.
* Acquire locks using 'with' statements.
* Fix possible test failure.
* Try to start Redis multiple times with different random ports if the original attempt failed.
* Fix test in which we redefine a remote function.
* Update worker code and services code to use plasma and the local scheduler.
* Cleanups.
* Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE.
* Fix bug in install-dependencies.sh.
* Lengthen timeout in failure_test.py.
* Cleanups.
* Cleanup services.start_ray_local.
* Clean up random name generation.
* Cleanups.