## What do these changes do?
1. Fix the Jenkins test failure by add driver id to Actor GCS Key.
2. Move `object_manager_test.py` from Jenkins to Travis.
Otherwise, in the event of a remote raylet crashing, the connection might be held by boost asio forever, and the pending callbacks will never get invoked. See also #3586.
* Allowing multiple users to access the /tmp/ray file at the same time
Previous sequence that caused this issue:
* User A starts ray with `ray.init` when /tmp/ray does not exist
* User B starts ray with `ray.init` and /tmp/ray now exists
User B will get a permissions error
Checking the permissions, /tmp/ray is 700
I have identified a race condition in `try_to_create_directory`
* Multiple processes try to create /tmp/ray at the same time
* chmod is either silently erroring or working properly within the race condition
Resolution: Move chmod outside of the check for whether the directory exists or not.
* Adding try except for users who do not own the directory
## What do these changes do?
Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.
This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.
## Related issue number
Closes: https://github.com/ray-project/ray/issues/3549
This PR provides a better error message when the generate_variants code
breaks. Also removes a comment about nesting dependencies.
This comes mainly as a hotfix solution for #3466. We should leave that issue open for future contribution 🙂
* mb impala
* fix
* paropt
* update
* cpu warn
* on cpu
* fix mb
* doc
* docs
* comment
* larger num
* early release
* remove grad clip
* only check loader count in multi gpu mode
* revert bad multigpu changes
* num sgd iter
* comment
* reuse optimizer
* add test
* par load test
* loosen test
* Update run_multi_node_tests.sh
* fix local mode
* Update agent.py
* Add a flag for whether an object has been created before
* Add regression test
* doc
* Share object directory between object and node managers
* Treat evicted actor tasks as failed
* minor
* Check return value
* Fix bug where object locations weren't getting updated on client death
* Fix mac build
* Use RayTaskError