Commit graph

2523 commits

Author SHA1 Message Date
Philipp Moritz
3bb65677dc Use one memory mapped file for plasma (#3871) 2019-02-06 23:53:05 -08:00
Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 (#3972) 2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44 [autoscaler] Autoscaler hangs forever on non-zero exit code command (#3969) 2019-02-06 17:25:24 -08:00
Stephanie Wang
49e9bec988
Fix raylet bug in driver cleanup (#3962)
* Fix task dependency manager cleanup on driver exit

* Add regression test

* Better check, update header
2019-02-06 11:19:10 -08:00
Stephanie Wang
244fd473f4
Only mark tasks as forwarded if they are in the lineage cache (#3958) 2019-02-05 23:01:38 -08:00
Alex LaGrassa
b0fe5af7c8 [doc] Update example-parameter-server.rst (#3773) 2019-02-05 22:00:54 -08:00
Robert Nishihara
fa4eb8313d Suppress warning for serializing different unique ID types in Python. (#3872)
* Suppress warning for serializing different unique ID types in Python.

* Add _ID_TYPES variable.
2019-02-05 11:38:33 -08:00
vfdev
b2b8417790 [tune] Improve mnist_pytorch.py example (#3894)
## What do these changes do?

* Improved --no-cuda handling
* Removed deprecated Variable usage


## Related issue number

Fixes #3873 
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-02-04 17:59:54 -08:00
Eric Liang
5fb813ff39
Don't check fail on missing lineage cache entry (#3861) 2019-02-04 17:45:41 -08:00
William Ma
f067223c4a Allow Ray processes to be started inside of gdb and tmux. (#3847) 2019-02-04 15:23:39 -08:00
Yuhong Guo
add8ae7063 Add bazel build for JNI code (#3918)
* Add bazel build for JNI code

* clean

* Add plasma client JNI build process

* refine

* clean linux part

* Add Java Library

* Remove java library

* Generate dylib after build using genrule
2019-02-04 13:03:46 -08:00
Wang Qing
e1c68a0881 Enable including Java worker for ray start command (#3838) 2019-02-04 16:23:43 +08:00
Eric Liang
7ef830bef1 [rllib] Add copy() in async samples optimizer to fix memory leak (#3938)
Fixes #3884.
2019-02-03 18:34:37 -08:00
Andrew Tan
8323419a6d [tune] Add SigOpt Integration (#3844) 2019-02-03 18:23:57 -08:00
Kristian Hartikainen
85294fb503 [autoscaler] node caching changes (#3937)
Breaks the node provider node getter into cached and non-cached versions.

Fixes #3930 by updating the node label finger print before updating labels.
Fixes #3935 by refreshing node cache if node ip is not found.
2019-02-03 17:48:07 -08:00
James Casbon
976f018dab [autoscaler] GCP: only call setIamPolicy if necessary (#3782) 2019-02-03 16:16:00 -08:00
James Casbon
b8cc176b4d [autoscaler] Document gcp subnet config (#3783)
Adds info to the gcp example yaml on using shared subnets.
2019-02-03 16:14:44 -08:00
Si-Yuan
9295ab8f60 Various Python code cleanups. (#3837) 2019-02-03 10:16:24 -08:00
Devin Petersohn
a1bcd2a4f5 Update Modin to 0.3.0 (#3936) 2019-02-02 23:06:16 -08:00
Michael Luo
1a015e420b Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented (#3934) 2019-02-02 22:10:58 -08:00
Richard Liaw
eab6dd72b5
[tune] logging fixes, better warnings, better cluster support (#3906) 2019-02-02 19:14:03 -08:00
Luke
002531b199 Enable LZ4 compression in pyarrow build (#3931)
Enable LZ4 compression in pyarrow build
2019-02-02 14:38:02 -08:00
Yuhong Guo
54cbb4396f Prepare socket file when start ray (#3925) 2019-02-02 12:53:36 +08:00
Eric Liang
0f81bc9a33 [rllib] on_train_result results do not get logged (#3865) 2019-02-01 20:32:07 -08:00
Robert Nishihara
e0f82fd260 Fix building python 3.7 wheel by installing newer numpy. (#3927) 2019-02-01 18:06:48 -08:00
Daniel Edgecumbe
315edab085 [autoscaler] Speedups (#3720)
- NodeUpdater gets its' IP in parallel now (no longer in __init__)
- We use persistent connections in SSH (temp folder created only for ray; ControlMaster)
- hash_runtime_conf was performing a pointless hexlify step, wasting time on large files
- We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed
- AWSNodeProvider caches nodes more aggressively
- NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it
- AWSNodeProvider batches EC2 update_tags calls
- Logging changes throughout to provide standardised timing information for profiling
- Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway)

## Related issue number
Issue #3599
2019-02-01 02:46:32 -08:00
Daniel Edgecumbe
ff3c6af1d6 [autoscaler]: Remove assertion in info string (#3916)
Fixes #3903
2019-02-01 00:32:24 -08:00
Tianming Xu
1302fafc0b [Tune] Add export_formats option to export policy graphs (#3868)
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.

For Ray Tune users, these APIs are not accessible through YAML configurations.

In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Kristian Hartikainen
b9eed2e86c [autoscaler] Move attach helper text under exec_cluster (#3920)
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Peter Schafhalter
62a0a7bdc7 [tune] Add BayesOpt (#3864)
Adds BayesOpt as a Tune suggestion algorithm.
2019-01-31 16:54:17 -08:00
Jimpachnet
d3551dd8df [tune] Added possibility to execute infinite recovery retries for a trial (#3901)
Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.
2019-01-31 02:21:16 -08:00
Philipp Moritz
beb75193da Fix linting on master (#3913) 2019-01-31 01:28:45 -08:00
Richard Liaw
d128636bab Ray Logging Configuration (#3691)
* fix logging for autoscaler

* module logging

* try this for logging

* yapf

* fix

* Initial logging setup

* momery

* ok

* remove basicconfig

* catch

* remove package logging

* print

* fix

* try_fix

* fix 1

* revert rllib

* logging level

* flake8

* fix

* fix

* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Richard Liaw
5f145041ef Update Release Docs (#3693) 2019-01-30 19:37:48 -08:00
Robert Nishihara
93214891b0 Small improvement to kubernetes config files. (#3875) 2019-01-30 18:00:20 -08:00
Rong Ou
8f6bd6cece change kubernetes examples to use Deployment (#3909) 2019-01-30 17:50:37 -08:00
Robert Nishihara
d06d9fc5d7 Fix Python linting errors. (#3905) 2019-01-30 13:43:18 -08:00
Kai Yang
02766adeca Limit maximum starting workers per language (#3852) 2019-01-29 21:43:12 -08:00
Eric Liang
152375aa8a
[rllib] Add evaluation option to DQN agent (#3835)
* add eval

* interval

* multiagent minor fix

* Update rllib.rst

* Update ddpg.py

* Update qmix.py
2019-01-29 21:19:53 -08:00
Yuhong Guo
c45b91dcca Make redis module safe without crashing by removing RAY_CHECK (#3855) 2019-01-29 21:06:31 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)
* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix
2019-01-29 21:06:09 -08:00
Bruno Morier
c9819a721d Update tempfile_services.py (#3896)
Fix an invalid reference to os.errno. errno have been removed from os in python 3.7. The fix only replaces it by the already imported errno.
2019-01-29 19:33:02 -08:00
Robert Nishihara
2887dac427 Use Redis version 5.0.3. (#3886) 2019-01-29 19:19:05 -08:00
Philipp Moritz
0aadf11c10 Fix compilation on macOS by adding virtual destructors (#3878) 2019-01-28 13:22:52 -08:00
Philipp Moritz
f7415b37c5 Build Ray with Bazel (#3867) 2019-01-27 18:32:04 -08:00
Eric Liang
c75038b945
[autoscaler] Updating a file in file mounts causes all worker nodes to get restarted 2019-01-27 17:41:37 -08:00
Stephanie Wang
ad9f1721d1 Fix object_manager_test.py::object_transfer_retry test (#3863) 2019-01-27 13:55:38 -08:00
Stephanie Wang
eddd60e14e Improve backend debug logging, refactor scheduling queues (#3819) 2019-01-26 16:15:48 +08:00
Yuhong Guo
066fa8abf3
Fix monitor_test.py by waiting for moniter.py to start working (#3840)
* Wait for moniter.py to start working

* Checkout None result in state.py
2019-01-25 18:07:15 +08:00
Philipp Moritz
20162ce159 Compile raylet cython bindings with bazel (#3842) 2019-01-25 00:57:31 -08:00