* Stream logs to driver by default.
* Fix from rebase
* Redirect raylet output independently of worker output.
* Fix.
* Create redis client with services.create_redis_client.
* Suppress Redis connection error at exit.
* Remove thread_safe_client from redis.
* Shutdown driver threads in ray.shutdown().
* Add warning for too many log messages.
* Only stop threads if worker is connected.
* Only stop threads if they exist.
* Remove unnecessary try/excepts.
* Fix
* Only add new logging handler once.
* Increase timeout.
* Fix tempfile test.
* Fix logging in cluster_utils.
* Revert "Increase timeout."
This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.
* Retry longer when connecting to plasma store from node manager and object manager.
* Close pubsub channels to avoid leaking file descriptors.
* Limit log monitor open files to 200.
* Increase plasma connect retries.
* Add comment.
Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)
## What do these changes do?
* Improved --no-cuda handling
* Removed deprecated Variable usage
## Related issue number
Fixes#3873
<!-- Are there any issues opened that will be resolved by merging this change? -->
Breaks the node provider node getter into cached and non-cached versions.
Fixes#3930 by updating the node label finger print before updating labels.
Fixes#3935 by refreshing node cache if node ip is not found.
- NodeUpdater gets its' IP in parallel now (no longer in __init__)
- We use persistent connections in SSH (temp folder created only for ray; ControlMaster)
- hash_runtime_conf was performing a pointless hexlify step, wasting time on large files
- We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed
- AWSNodeProvider caches nodes more aggressively
- NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it
- AWSNodeProvider batches EC2 update_tags calls
- Logging changes throughout to provide standardised timing information for profiling
- Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway)
## Related issue number
Issue #3599
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.
For Ray Tune users, these APIs are not accessible through YAML configurations.
In this pull request, export_formats option is provided to enable users to choose the desired export format.
* Factor out starting Ray processes.
* Detect flags through environment variables.
* Return ProcessInfo from start_ray_process.
* Print valgrind errors at exit.
* Test valgrind in travis.
* Some valgrind fixes.
* Undo raylet monitor change.
* Only test plasma store in valgrind.