Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 ( #3972 )
2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44
[autoscaler] Autoscaler hangs forever on non-zero exit code command ( #3969 )
2019-02-06 17:25:24 -08:00
Robert Nishihara
fa4eb8313d
Suppress warning for serializing different unique ID types in Python. ( #3872 )
...
* Suppress warning for serializing different unique ID types in Python.
* Add _ID_TYPES variable.
2019-02-05 11:38:33 -08:00
vfdev
b2b8417790
[tune] Improve mnist_pytorch.py example ( #3894 )
...
## What do these changes do?
* Improved --no-cuda handling
* Removed deprecated Variable usage
## Related issue number
Fixes #3873
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-02-04 17:59:54 -08:00
William Ma
f067223c4a
Allow Ray processes to be started inside of gdb and tmux. ( #3847 )
2019-02-04 15:23:39 -08:00
Wang Qing
e1c68a0881
Enable including Java worker for ray start
command ( #3838 )
2019-02-04 16:23:43 +08:00
Eric Liang
7ef830bef1
[rllib] Add copy() in async samples optimizer to fix memory leak ( #3938 )
...
Fixes #3884 .
2019-02-03 18:34:37 -08:00
Andrew Tan
8323419a6d
[tune] Add SigOpt Integration ( #3844 )
2019-02-03 18:23:57 -08:00
Kristian Hartikainen
85294fb503
[autoscaler] node caching changes ( #3937 )
...
Breaks the node provider node getter into cached and non-cached versions.
Fixes #3930 by updating the node label finger print before updating labels.
Fixes #3935 by refreshing node cache if node ip is not found.
2019-02-03 17:48:07 -08:00
James Casbon
976f018dab
[autoscaler] GCP: only call setIamPolicy if necessary ( #3782 )
2019-02-03 16:16:00 -08:00
James Casbon
b8cc176b4d
[autoscaler] Document gcp subnet config ( #3783 )
...
Adds info to the gcp example yaml on using shared subnets.
2019-02-03 16:14:44 -08:00
Si-Yuan
9295ab8f60
Various Python code cleanups. ( #3837 )
2019-02-03 10:16:24 -08:00
Michael Luo
1a015e420b
Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented ( #3934 )
2019-02-02 22:10:58 -08:00
Richard Liaw
eab6dd72b5
[tune] logging fixes, better warnings, better cluster support ( #3906 )
2019-02-02 19:14:03 -08:00
Yuhong Guo
54cbb4396f
Prepare socket file when start ray ( #3925 )
2019-02-02 12:53:36 +08:00
Eric Liang
0f81bc9a33
[rllib] on_train_result results do not get logged ( #3865 )
2019-02-01 20:32:07 -08:00
Robert Nishihara
e0f82fd260
Fix building python 3.7 wheel by installing newer numpy. ( #3927 )
2019-02-01 18:06:48 -08:00
Daniel Edgecumbe
315edab085
[autoscaler] Speedups ( #3720 )
...
- NodeUpdater gets its' IP in parallel now (no longer in __init__)
- We use persistent connections in SSH (temp folder created only for ray; ControlMaster)
- hash_runtime_conf was performing a pointless hexlify step, wasting time on large files
- We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed
- AWSNodeProvider caches nodes more aggressively
- NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it
- AWSNodeProvider batches EC2 update_tags calls
- Logging changes throughout to provide standardised timing information for profiling
- Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway)
## Related issue number
Issue #3599
2019-02-01 02:46:32 -08:00
Daniel Edgecumbe
ff3c6af1d6
[autoscaler]: Remove assertion in info string ( #3916 )
...
Fixes #3903
2019-02-01 00:32:24 -08:00
Tianming Xu
1302fafc0b
[Tune] Add export_formats option to export policy graphs ( #3868 )
...
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.
For Ray Tune users, these APIs are not accessible through YAML configurations.
In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Kristian Hartikainen
b9eed2e86c
[autoscaler] Move attach helper text under exec_cluster ( #3920 )
...
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Peter Schafhalter
62a0a7bdc7
[tune] Add BayesOpt ( #3864 )
...
Adds BayesOpt as a Tune suggestion algorithm.
2019-01-31 16:54:17 -08:00
Jimpachnet
d3551dd8df
[tune] Added possibility to execute infinite recovery retries for a trial ( #3901 )
...
Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.
2019-01-31 02:21:16 -08:00
Richard Liaw
d128636bab
Ray Logging Configuration ( #3691 )
...
* fix logging for autoscaler
* module logging
* try this for logging
* yapf
* fix
* Initial logging setup
* momery
* ok
* remove basicconfig
* catch
* remove package logging
* print
* fix
* try_fix
* fix 1
* revert rllib
* logging level
* flake8
* fix
* fix
* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Robert Nishihara
d06d9fc5d7
Fix Python linting errors. ( #3905 )
2019-01-30 13:43:18 -08:00
Eric Liang
152375aa8a
[rllib] Add evaluation option to DQN agent ( #3835 )
...
* add eval
* interval
* multiagent minor fix
* Update rllib.rst
* Update ddpg.py
* Update qmix.py
2019-01-29 21:19:53 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples ( #3815 )
...
* wip
* lint
* wip
* up
* wip
* update examples
* wip
* remove carla
* update
* improve envspec
* link to custom
* Update rllib-env.rst
* update
* fix
* fn
* lint
* ds
* ssd games
* desc
* fix up docs
* fix
2019-01-29 21:06:09 -08:00
Bruno Morier
c9819a721d
Update tempfile_services.py ( #3896 )
...
Fix an invalid reference to os.errno. errno have been removed from os in python 3.7. The fix only replaces it by the already imported errno.
2019-01-29 19:33:02 -08:00
Eric Liang
c75038b945
[autoscaler] Updating a file in file mounts causes all worker nodes to get restarted
2019-01-27 17:41:37 -08:00
Stephanie Wang
ad9f1721d1
Fix object_manager_test.py::object_transfer_retry test ( #3863 )
2019-01-27 13:55:38 -08:00
Yuhong Guo
066fa8abf3
Fix monitor_test.py by waiting for moniter.py to start working ( #3840 )
...
* Wait for moniter.py to start working
* Checkout None result in state.py
2019-01-25 18:07:15 +08:00
Philipp Moritz
20162ce159
Compile raylet cython bindings with bazel ( #3842 )
2019-01-25 00:57:31 -08:00
Si-Yuan
48139cf861
Migrate Python C extension to Cython ( #3541 )
2019-01-24 09:17:14 -08:00
Eric Liang
04ec47cbd4
[rllib] annotate public vs developer vs private APIs ( #3808 )
2019-01-23 21:27:26 -08:00
Wang Qing
816406ea3d
[Java] Fix setCurrentTask()
in multi threading ( #3821 )
2019-01-23 20:45:30 +08:00
Robert Nishihara
0b1608a546
Factor out code for starting new processes and test plasma store in valgrind. ( #3824 )
...
* Factor out starting Ray processes.
* Detect flags through environment variables.
* Return ProcessInfo from start_ray_process.
* Print valgrind errors at exit.
* Test valgrind in travis.
* Some valgrind fixes.
* Undo raylet monitor change.
* Only test plasma store in valgrind.
2019-01-22 14:59:11 -08:00
Eric Liang
f0e6523323
[rllib] Don't call reset() unless necessary for multi-agent envs
2019-01-20 15:00:18 -08:00
Eric Liang
aad48ee5a5
[tune] Fully deprecate raw function literals in Tune ( #3788 )
...
Related: https://github.com/ray-project/ray/issues/3785
2019-01-19 17:09:36 -08:00
Michael Luo
16f7ca45e4
Appo ( #3779 )
...
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder
* Deleted unneccesary vtrace.py file
* Update pong-impala.yaml
* Cleaned PPO Code
* Update pong-impala.yaml
* Update pong-impala.yaml
* wip
* new ifle
* refactor
* add vtrace off option
* revert
* support any space
* docs
* fix comment
* remove kl
* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Robert Nishihara
9af5a62e05
Give better error for old-style actor classes. ( #3793 )
2019-01-17 19:05:04 -08:00
Richard Liaw
0537508106
Bump strings for 0.6.2 ( #3801 )
2019-01-17 19:03:27 -08:00
Jones Wong
319c1340cb
[rllib] Develop MARWIL ( #3635 )
...
* add marvil policy graph
* fix typo
* add offline optimizer and enable running marwil
* fix loss function
* add maintaining the moving average of advantage norm
* use sync replay optimizer for unifying
* remove offline optimizer and use sync replay optimizer
* format by yapf
* add imitation learning objective
* fix according to eric's review
* format by yapf
* revise
* add test data
* marwil
2019-01-16 19:00:43 -08:00
Richard Liaw
75ac016e2b
Bump version ( #3787 )
2019-01-16 11:40:54 -08:00
Richard Liaw
fa99fda2b4
Application Stress Tests ( #3612 )
2019-01-16 02:05:16 -08:00
Richard Liaw
c28e6d41f5
[tune] Avoid overwriting checkpoint file ( #3781 )
2019-01-16 02:03:16 -08:00
Eric Liang
401e656b95
[rllib] Sync filters at end of iteration not start; hierarchical docs ( #3769 )
2019-01-15 16:25:25 -08:00
Richard Liaw
3918934dfd
[tune] Cross-Node Recovery ( #3725 )
...
Augments trial restore to also check if the runner is at the same
location. If not, the checkpoint files are pushed onto the new location.
2019-01-15 10:37:28 -08:00
Si-Yuan
a5df8e3532
minor fix ( #3770 )
2019-01-14 13:52:51 -08:00
Robert Nishihara
19908c01b8
Use environment markers to only install faulthandler in Python < 3.3. ( #3764 )
2019-01-14 15:55:59 +08:00
Eugene Vinitsky
a5d1f03515
[rllib] fix for rollout of lstm policies ( #3643 )
...
* fix for lstm policies
* added call to local evaluator
* Update python/ray/rllib/rollout.py
Co-Authored-By: eugenevinitsky <eugenevinitsky@users.noreply.github.com>
* Update rollout.py
* Update rollout.py
2019-01-13 15:54:23 -08:00