Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.
* Added agent name & env id to default logdir prefix
* Revert "Added agent name & env id to default logdir prefix"
This reverts commit 07cfdf80d2537da3c67dd4f553c5f3e43671cc7d.
* Added default logger creator with informative prefix to Agent
* Updated import order & improved str cat
* Update agent.py
* use cmake to build ray project, no need to appply build.sh before cmake, fix some abuse of cmake, improve the build performance
* support boost external project, avoid using the system or build.sh boost
* keep compatible with build.sh, remove boost and arrow build from it.
* bugfix: parquet bison version control, plasma_java lib install problem
* bugfix: cmake, do not compile plasma java client if no need
* bugfix: component failures test timeout machenism has problem for plasma manager failed case
* bugfix: arrow use lib64 in centos, travis check-git-clang-format-output.sh does not support other branches except master
* revert some fix
* set arrow python executable, fix format error in component_failures_test.py
* make clean arrow python build directory
* update cmake code style, back to support cmake minimum version 3.4
A fix to an example for tune (`python/ray/tune/examples/pbt_tune_cifar10_with_keras.py`) where the hyperparameters for the optimizer, learning rate and decay, were not being passed into the optimizer.
This means that the current optimizer uses default values for the hyperparameters no matter the config.
Add new search algorithm (genetic) along with the base framework of the searcher (which performs some basic jobs such as logging, recording and organizing in our project).
Note that this is the initial commit. In the following days, we will add example, UT, and other refinements.
When running in a screen (or any other time it is hard to scroll up), printing "Suppressing previous error message" is not helpful since the previous error is lost far above past scrollback. Better to just print it repeatedly at the end.
tada 1
Adds the ability for trainables to reset their configurations during experiments. These changes in particular add the base functions to the trial_executor and trainable interfaces as well as giving the basic implementation on the PopulationBasedTraining scheduler.
Related issue number: #2741
* removed cv2
* remove opencv
* increased number of default rollouts ARS
* put cv2 back in this branch
* put cv2 back in this branch
* moved cv2 back where it belongs in preprocessors
Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812
It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.
This makes sure we always update the local filter, and adds an option to synchronize the remote filters as well. In APEX_DDPG we previously didn't do either. The first is needed for checkpoint correctness, the second might help performance.
* Convert multi_node_test.py to pytest.
* Convert array_test.py to pytest.
* Convert failure_test.py to pytest.
* Convert microbenchmarks to pytest.
* Convert component_failures_test.py to pytest and some minor quotes changes.
* Convert tensorflow_test.py to pytest.
* Convert actor_test.py to pytest.
* Fix.
* Fix
* Added checkpoint_at_end option. To fix#2740
* Added ability to checkpoint at the end of trials if the option is set to True
* checkpoint_at_end option added; Consistent with Experience and Trial runner
* checkpoint_at_end option mentioned in the tune usage guide
* Moved the redundant checkpoint criteria check out of the if-elif
* Added note that checkpoint_at_end is enabled only when checkpoint_freq is not 0
* Added test case for checkpoint_at_end
* Made checkpoint_at_end have an effect regardless of checkpoint_freq
* Removed comment from the test case
* Fixed the indentation
* Fixed pep8 E231
* Handled cases when trainable does not have _save implemented
* Constrained test case to a particular exp using the MockAgent
* Revert "Constrained test case to a particular exp using the MockAgent"
This reverts commit e965a9358ec7859b99a3aabb681286d6ba3c3906.
* Revert "Handled cases when trainable does not have _save implemented"
This reverts commit 0f5382f996ff0cbf3d054742db866c33494d173a.
* Simpler test case for checkpoint_at_end
* Preserved bools from loosing their actual value
* Revert "Moved the redundant checkpoint criteria check out of the if-elif"
This reverts commit 783005122902240b0ee177e9e206e397356af9c5.
* Fix linting error.
* Limit number of concurrent workers started by hardware concurrency.
* Check if std:🧵:hardware_concurrency() returns 0.
* Pass in max concurrency from Python.
* Fix Java call to startRaylet.
* Fix typo
* Remove unnecessary cast.
* Fix linting.
* Cleanups on Java side.
* Comment back in actor test.
* Require maximum_startup_concurrency to be at least 1.
* Fix linting and test.
* Improve documentation.
* Fix typo.