* removed cv2
* remove opencv
* increased number of default rollouts ARS
* put cv2 back in this branch
* put cv2 back in this branch
* moved cv2 back where it belongs in preprocessors
This change addresses issue #2809. Test #2797 has been enabled for raylet and can pass.
The following should happen when a driver exits (either gracefully or ungracefully).
#2797 should be enabled and pass.
Any actors created by the driver that are still running should be killed.
Any workers running tasks for the driver should be killed.
Any tasks for the driver in any node_manager queues should be removed.
Any future tasks received by a node manager for the driver should be ignored.
The driver death notification should only be received once.
Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812
It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.
This makes sure we always update the local filter, and adds an option to synchronize the remote filters as well. In APEX_DDPG we previously didn't do either. The first is needed for checkpoint correctness, the second might help performance.
Previously `Ray.createActor` only support creating an actor without any parameter. This PR adds the support for creating an actor with parameters. Moreover, besides using a constructor, it's now also allowed to create an actor with a factory method. For more usage, prefer refer to `ActorTest.java`.
API module (`ray/java/api` dir) includes all public APIs provided by Ray, it should be the only module that normal Ray users need to face.
The purpose of this PR to first improve the code quality of the API module. Subsequent PRs will improve other modules later. The changes of this PR include the following aspects:
1) Only keep interfaces in api module, to hide implementation details from users and fix circular dependencies among modules.
2) Document everything in the api module.
3) Improve naming.
4) Add more tests for API.
5) Also fix/improve related code in other modules.
6) Remove some unused code.
(Apologize for posting such a large PR. Java worker code has been lack of maintenance for a while. There're a lot of code quality issues that need to be fixed. We plan to use a couple of large PRs to address them. After that, future changes will come in small PRs.)
* Add signal handlers to improve debuggability.
* Fix Linux compiling
* Fix Lint
* Change SIGILL case that happens in both Linux and MaxOs
* Add signal handler to main functions.
* Change handler name.
* Address comment
* Address comment.
* Fix Linux building failure
* Introduce RAII mechanism to SignalHandlers.
* Add InitShutdownWrapper to handle all RAII requirements
* Change util_test to signal_test
* Make sure shutdown is not nullptr.
* Using google::InstallFailureSignalHandler() instead of our own signal handler
* Refine code addording to comment
* Fix valgrind test failure.
* remove Shutdown template
* consistency
* linting
Basically a re-implementation of #2281, with modifications of #2298 (A fix of #2334, for rebasing issues.).
[+] Implement sharding for gcs tables.
[+] Keep ClientTable and ErrorTable managed by the primary_shard. TaskTable is managed by the primary_shard for now, until a good hashing for tasks is implemented.
[+] Move AsyncGcsClient's initialization into Connect function.
[-] Move GetRedisShard and bool sharding from RedisContext's connect into AsyncGcsClient. This may make the interface cleaner.
* Convert multi_node_test.py to pytest.
* Convert array_test.py to pytest.
* Convert failure_test.py to pytest.
* Convert microbenchmarks to pytest.
* Convert component_failures_test.py to pytest and some minor quotes changes.
* Convert tensorflow_test.py to pytest.
* Convert actor_test.py to pytest.
* Fix.
* Fix
* Add some imports that make it easier to build with Bazel
* Use "/tmp" paths for sockets in tests
* Move `asio_test` into `run_gcs_tests.sh` instead of starting and stopping Redis within the test fixture with a `system` call.
1) Renamed the native JNI methods and some parameters of JNI methods.
2) Fixed native JNI methods' signatures by `javah` tool.
3) Removed some useless native methods.
This removes the force_start argument from StartWorkerProcess in the worker pool so that no more than maximum_startup_concurrency are ever started concurrently. In particular, when the raylet starts up, it my start fewer than num_workers workers.
* Added checkpoint_at_end option. To fix#2740
* Added ability to checkpoint at the end of trials if the option is set to True
* checkpoint_at_end option added; Consistent with Experience and Trial runner
* checkpoint_at_end option mentioned in the tune usage guide
* Moved the redundant checkpoint criteria check out of the if-elif
* Added note that checkpoint_at_end is enabled only when checkpoint_freq is not 0
* Added test case for checkpoint_at_end
* Made checkpoint_at_end have an effect regardless of checkpoint_freq
* Removed comment from the test case
* Fixed the indentation
* Fixed pep8 E231
* Handled cases when trainable does not have _save implemented
* Constrained test case to a particular exp using the MockAgent
* Revert "Constrained test case to a particular exp using the MockAgent"
This reverts commit e965a9358ec7859b99a3aabb681286d6ba3c3906.
* Revert "Handled cases when trainable does not have _save implemented"
This reverts commit 0f5382f996ff0cbf3d054742db866c33494d173a.
* Simpler test case for checkpoint_at_end
* Preserved bools from loosing their actual value
* Revert "Moved the redundant checkpoint criteria check out of the if-elif"
This reverts commit 783005122902240b0ee177e9e206e397356af9c5.
* Fix linting error.
* Limit number of concurrent workers started by hardware concurrency.
* Check if std:🧵:hardware_concurrency() returns 0.
* Pass in max concurrency from Python.
* Fix Java call to startRaylet.
* Fix typo
* Remove unnecessary cast.
* Fix linting.
* Cleanups on Java side.
* Comment back in actor test.
* Require maximum_startup_concurrency to be at least 1.
* Fix linting and test.
* Improve documentation.
* Fix typo.