* Factor out starting Ray processes.
* Detect flags through environment variables.
* Return ProcessInfo from start_ray_process.
* Print valgrind errors at exit.
* Test valgrind in travis.
* Some valgrind fixes.
* Undo raylet monitor change.
* Only test plasma store in valgrind.
* add marvil policy graph
* fix typo
* add offline optimizer and enable running marwil
* fix loss function
* add maintaining the moving average of advantage norm
* use sync replay optimizer for unifying
* remove offline optimizer and use sync replay optimizer
* format by yapf
* add imitation learning objective
* fix according to eric's review
* format by yapf
* revise
* add test data
* marwil
* Refactor code about ray.ObjectID.
* remove from_random and use nil_id instead of constructor
* remove id() in hash
* Lint and fix
* Change driver id to ObjectID
* Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()
* Implement Node class and move most of services.py into it.
* Wait for nodes as they are added to the cluster.
* Fix Redis authentication bug.
* Fix bug in client table ordering.
* Address comments.
* Kill raylet before plasma store in test.
* Minor
## What do these changes do?
Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.
These commands are both for testing and are not recommended for normal use.
## Related issue number
Closes#3685.
* Convert UniqueID::nil() to a constructor
* Cleanup actor handle pickling code
* Add new actor handles to the task spec
* Pass in new actor handles
* Add new handles to the actor registration
* Regression test for actor handle forking and GC
* lint and doc
* Handle pickled actor handles in the backend and some refactoring
* Add regression test for dummy object GC and pickled actor handles
* Check for duplicate actor tasks on submission
* Regression test for forking twice, fix failed named actor leak
* Fix bug for forking twice
* lint
* Revert "Fix bug for forking twice"
This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac.
* Add new actor handles when task is assigned, not finished
* Remove comment
* remove UniqueID()
* Updates
* update
* fix
* fix java
* fixes
* fix
1. Fix the problem of duplicated stored logs.
2. Save log whose level is higher than severity_threshold, not only with severity_threshold.
3. Fix a `log_dir` bug: storing logs in a wrong path.
* Separate out functionality for querying client table and improve cluster.wait_for_nodes() API.
* Linting
* Add back logging statements.
* info -> debug
## What do these changes do?
This option goes along with `min_workers`, and `max_workers`. When the
cluster is first brought up (or when it is refreshed with a subsequent
`ray up`) this number of nodes will be started.
It's a workaround for issues of scaling (see related issues) where it
can take a long time (or forever in the case where the head node has
`--num-cpus 0`) to scale up a cluster in response to increasing demand.
## Related issue number
Workaround for https://github.com/ray-project/ray/issues/3339 and https://github.com/ray-project/ray/issues/2106
* Push a warning to all users when large number of workers have been started.
* Add test.
* Fix bug.
* Give warning when worker starts instead of when worker registers.
* Fix
* Fix tests
* Fix warning text in pbt logger
* Allow nested mutations in pbt by recursing explore function
* Add test for nested pbt mutation
* Update pbt explore to only call custom explore on top level
* fix test