Fix some code issues found by code scanning tool:
**1. Macro compares unsigned to 0(NO_EFFECT)**
CWE570: An unsigned value can never be less than 0
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "this->create_buffer_state_[object_id].num_seals_remaining >= 0UL".
~/ray/src/ray/object_manager/object_buffer_pool.cc: ray::ObjectBufferPool::SealChunk(const ray::UniqueID &, unsigned long)
**2. Inferred misuse of enum(MIXED_ENUMS)**
CWE398: An integer expression which was inferred to have an enum type is mixed with a different enum type
This case, "static_cast(ray::object_manager::protocol::MessageType::PushRequest)", implies the effective type of "message_type" is "ray::object_manager::protocol::MessageType".
~/ray/src/ray/object_manager/object_manager.cc: ray::ObjectManager::ProcessClientMessage(std::shared_ptr> &, long, const unsigned char *)
Object manager uses multi-threading for transferring objects between different nodes, the plasma client used in object_buffer_pool_ needs to be protected by lock. We have met crashes caused by missing lock in FreeObjects() interface, this PR fixes that issue.
1) if using `PyObject_GetIter`, the caller must call `Py_DECREF` to avoid memory leak. But with `PyList_GetItem`, `Py_DECREF` isn't needed.
2) the `Py_BuildValue` call in `wait` doesn't need to increment ref count.
## What do these changes do?
1. Fix the Jenkins test failure by add driver id to Actor GCS Key.
2. Move `object_manager_test.py` from Jenkins to Travis.
Otherwise, in the event of a remote raylet crashing, the connection might be held by boost asio forever, and the pending callbacks will never get invoked. See also #3586.
* Allowing multiple users to access the /tmp/ray file at the same time
Previous sequence that caused this issue:
* User A starts ray with `ray.init` when /tmp/ray does not exist
* User B starts ray with `ray.init` and /tmp/ray now exists
User B will get a permissions error
Checking the permissions, /tmp/ray is 700
I have identified a race condition in `try_to_create_directory`
* Multiple processes try to create /tmp/ray at the same time
* chmod is either silently erroring or working properly within the race condition
Resolution: Move chmod outside of the check for whether the directory exists or not.
* Adding try except for users who do not own the directory
## What do these changes do?
Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.
This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.
## Related issue number
Closes: https://github.com/ray-project/ray/issues/3549
This PR provides a better error message when the generate_variants code
breaks. Also removes a comment about nesting dependencies.
This comes mainly as a hotfix solution for #3466. We should leave that issue open for future contribution 🙂
* mb impala
* fix
* paropt
* update
* cpu warn
* on cpu
* fix mb
* doc
* docs
* comment
* larger num
* early release
* remove grad clip
* only check loader count in multi gpu mode
* revert bad multigpu changes
* num sgd iter
* comment
* reuse optimizer
* add test
* par load test
* loosen test
* Update run_multi_node_tests.sh
* fix local mode
* Update agent.py