Philipp Moritz
3bb65677dc
Use one memory mapped file for plasma ( #3871 )
2019-02-06 23:53:05 -08:00
Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 ( #3972 )
2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44
[autoscaler] Autoscaler hangs forever on non-zero exit code command ( #3969 )
2019-02-06 17:25:24 -08:00
Stephanie Wang
49e9bec988
Fix raylet bug in driver cleanup ( #3962 )
...
* Fix task dependency manager cleanup on driver exit
* Add regression test
* Better check, update header
2019-02-06 11:19:10 -08:00
Stephanie Wang
244fd473f4
Only mark tasks as forwarded if they are in the lineage cache ( #3958 )
2019-02-05 23:01:38 -08:00
Alex LaGrassa
b0fe5af7c8
[doc] Update example-parameter-server.rst ( #3773 )
2019-02-05 22:00:54 -08:00
Robert Nishihara
fa4eb8313d
Suppress warning for serializing different unique ID types in Python. ( #3872 )
...
* Suppress warning for serializing different unique ID types in Python.
* Add _ID_TYPES variable.
2019-02-05 11:38:33 -08:00
vfdev
b2b8417790
[tune] Improve mnist_pytorch.py example ( #3894 )
...
## What do these changes do?
* Improved --no-cuda handling
* Removed deprecated Variable usage
## Related issue number
Fixes #3873
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-02-04 17:59:54 -08:00
Eric Liang
5fb813ff39
Don't check fail on missing lineage cache entry ( #3861 )
2019-02-04 17:45:41 -08:00
William Ma
f067223c4a
Allow Ray processes to be started inside of gdb and tmux. ( #3847 )
2019-02-04 15:23:39 -08:00
Yuhong Guo
add8ae7063
Add bazel build for JNI code ( #3918 )
...
* Add bazel build for JNI code
* clean
* Add plasma client JNI build process
* refine
* clean linux part
* Add Java Library
* Remove java library
* Generate dylib after build using genrule
2019-02-04 13:03:46 -08:00
Wang Qing
e1c68a0881
Enable including Java worker for ray start
command ( #3838 )
2019-02-04 16:23:43 +08:00
Eric Liang
7ef830bef1
[rllib] Add copy() in async samples optimizer to fix memory leak ( #3938 )
...
Fixes #3884 .
2019-02-03 18:34:37 -08:00
Andrew Tan
8323419a6d
[tune] Add SigOpt Integration ( #3844 )
2019-02-03 18:23:57 -08:00
Kristian Hartikainen
85294fb503
[autoscaler] node caching changes ( #3937 )
...
Breaks the node provider node getter into cached and non-cached versions.
Fixes #3930 by updating the node label finger print before updating labels.
Fixes #3935 by refreshing node cache if node ip is not found.
2019-02-03 17:48:07 -08:00
James Casbon
976f018dab
[autoscaler] GCP: only call setIamPolicy if necessary ( #3782 )
2019-02-03 16:16:00 -08:00
James Casbon
b8cc176b4d
[autoscaler] Document gcp subnet config ( #3783 )
...
Adds info to the gcp example yaml on using shared subnets.
2019-02-03 16:14:44 -08:00
Si-Yuan
9295ab8f60
Various Python code cleanups. ( #3837 )
2019-02-03 10:16:24 -08:00
Devin Petersohn
a1bcd2a4f5
Update Modin to 0.3.0 ( #3936 )
2019-02-02 23:06:16 -08:00
Michael Luo
1a015e420b
Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented ( #3934 )
2019-02-02 22:10:58 -08:00
Richard Liaw
eab6dd72b5
[tune] logging fixes, better warnings, better cluster support ( #3906 )
2019-02-02 19:14:03 -08:00
Luke
002531b199
Enable LZ4 compression in pyarrow build ( #3931 )
...
Enable LZ4 compression in pyarrow build
2019-02-02 14:38:02 -08:00
Yuhong Guo
54cbb4396f
Prepare socket file when start ray ( #3925 )
2019-02-02 12:53:36 +08:00
Eric Liang
0f81bc9a33
[rllib] on_train_result results do not get logged ( #3865 )
2019-02-01 20:32:07 -08:00
Robert Nishihara
e0f82fd260
Fix building python 3.7 wheel by installing newer numpy. ( #3927 )
2019-02-01 18:06:48 -08:00
Daniel Edgecumbe
315edab085
[autoscaler] Speedups ( #3720 )
...
- NodeUpdater gets its' IP in parallel now (no longer in __init__)
- We use persistent connections in SSH (temp folder created only for ray; ControlMaster)
- hash_runtime_conf was performing a pointless hexlify step, wasting time on large files
- We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed
- AWSNodeProvider caches nodes more aggressively
- NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it
- AWSNodeProvider batches EC2 update_tags calls
- Logging changes throughout to provide standardised timing information for profiling
- Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway)
## Related issue number
Issue #3599
2019-02-01 02:46:32 -08:00
Daniel Edgecumbe
ff3c6af1d6
[autoscaler]: Remove assertion in info string ( #3916 )
...
Fixes #3903
2019-02-01 00:32:24 -08:00
Tianming Xu
1302fafc0b
[Tune] Add export_formats option to export policy graphs ( #3868 )
...
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.
For Ray Tune users, these APIs are not accessible through YAML configurations.
In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Kristian Hartikainen
b9eed2e86c
[autoscaler] Move attach helper text under exec_cluster ( #3920 )
...
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Peter Schafhalter
62a0a7bdc7
[tune] Add BayesOpt ( #3864 )
...
Adds BayesOpt as a Tune suggestion algorithm.
2019-01-31 16:54:17 -08:00
Jimpachnet
d3551dd8df
[tune] Added possibility to execute infinite recovery retries for a trial ( #3901 )
...
Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.
2019-01-31 02:21:16 -08:00
Philipp Moritz
beb75193da
Fix linting on master ( #3913 )
2019-01-31 01:28:45 -08:00
Richard Liaw
d128636bab
Ray Logging Configuration ( #3691 )
...
* fix logging for autoscaler
* module logging
* try this for logging
* yapf
* fix
* Initial logging setup
* momery
* ok
* remove basicconfig
* catch
* remove package logging
* print
* fix
* try_fix
* fix 1
* revert rllib
* logging level
* flake8
* fix
* fix
* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Richard Liaw
5f145041ef
Update Release Docs ( #3693 )
2019-01-30 19:37:48 -08:00
Robert Nishihara
93214891b0
Small improvement to kubernetes config files. ( #3875 )
2019-01-30 18:00:20 -08:00
Rong Ou
8f6bd6cece
change kubernetes examples to use Deployment
( #3909 )
2019-01-30 17:50:37 -08:00
Robert Nishihara
d06d9fc5d7
Fix Python linting errors. ( #3905 )
2019-01-30 13:43:18 -08:00
Kai Yang
02766adeca
Limit maximum starting workers per language ( #3852 )
2019-01-29 21:43:12 -08:00
Eric Liang
152375aa8a
[rllib] Add evaluation option to DQN agent ( #3835 )
...
* add eval
* interval
* multiagent minor fix
* Update rllib.rst
* Update ddpg.py
* Update qmix.py
2019-01-29 21:19:53 -08:00
Yuhong Guo
c45b91dcca
Make redis module safe without crashing by removing RAY_CHECK ( #3855 )
2019-01-29 21:06:31 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples ( #3815 )
...
* wip
* lint
* wip
* up
* wip
* update examples
* wip
* remove carla
* update
* improve envspec
* link to custom
* Update rllib-env.rst
* update
* fix
* fn
* lint
* ds
* ssd games
* desc
* fix up docs
* fix
2019-01-29 21:06:09 -08:00
Bruno Morier
c9819a721d
Update tempfile_services.py ( #3896 )
...
Fix an invalid reference to os.errno. errno have been removed from os in python 3.7. The fix only replaces it by the already imported errno.
2019-01-29 19:33:02 -08:00
Robert Nishihara
2887dac427
Use Redis version 5.0.3. ( #3886 )
2019-01-29 19:19:05 -08:00
Philipp Moritz
0aadf11c10
Fix compilation on macOS by adding virtual destructors ( #3878 )
2019-01-28 13:22:52 -08:00
Philipp Moritz
f7415b37c5
Build Ray with Bazel ( #3867 )
2019-01-27 18:32:04 -08:00
Eric Liang
c75038b945
[autoscaler] Updating a file in file mounts causes all worker nodes to get restarted
2019-01-27 17:41:37 -08:00
Stephanie Wang
ad9f1721d1
Fix object_manager_test.py::object_transfer_retry test ( #3863 )
2019-01-27 13:55:38 -08:00
Stephanie Wang
eddd60e14e
Improve backend debug logging, refactor scheduling queues ( #3819 )
2019-01-26 16:15:48 +08:00
Yuhong Guo
066fa8abf3
Fix monitor_test.py by waiting for moniter.py to start working ( #3840 )
...
* Wait for moniter.py to start working
* Checkout None result in state.py
2019-01-25 18:07:15 +08:00
Philipp Moritz
20162ce159
Compile raylet cython bindings with bazel ( #3842 )
2019-01-25 00:57:31 -08:00