1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-11-06 09:21:38 -05:00
Commit graph

13,388 commits

Author SHA1 Message Date
Philipp Moritz
ab809bd927 update ray version to 0.7.0dev () 2019-02-10 19:56:42 -08:00
Eric Liang
8e9f2c923f
[autoscaler] Use RLock in addition to FileLock 2019-02-10 19:16:43 -08:00
Yuhong Guo
5fb1efd60d Fix CI test failures () 2019-02-11 11:01:14 +08:00
bjg2
e703b9f49d [wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer ()
* added stats changes to optimizer

* changes timers

* fix python 2 compat

* improved optimizer throughput stats

* Update async_samples_optimizer.py

* fix python2 compat
2019-02-10 01:25:22 -08:00
Yuhong Guo
3a66d47a3a
Remove RAY_CHECK from JNI code ()
* Remove RAY_CHECK in JNI

* Try to add mvn test to test the exception.

* Refine

* Address comments
2019-02-09 18:10:22 +08:00
bibabolynn
728031a972 [java] when put an object in plasma store, ignore "object alreay exists" exception ()
* distinct plasma client exception

* Update ObjectStoreProxy.java

* Update and rename PlasmaArrowTest.java to PlasmaStoreTest.java

* store put

* Use testng to replace junit to fix test failure
2019-02-09 18:03:17 +08:00
Eric Liang
29322c7389
[rllib] Replay buffer for IMPALA should default to 0 slots. ()
* disable replay

* make lq configurable

* leak test

* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Robert Nishihara
6a32b410bb Update versions from 0.6.2 -> 0.6.3 in the documentation. () 2019-02-07 20:57:37 -08:00
Robert Nishihara
ef527f84ab Stream logs to driver by default. ()
* Stream logs to driver by default.

* Fix from rebase

* Redirect raylet output independently of worker output.

* Fix.

* Create redis client with services.create_redis_client.

* Suppress Redis connection error at exit.

* Remove thread_safe_client from redis.

* Shutdown driver threads in ray.shutdown().

* Add warning for too many log messages.

* Only stop threads if worker is connected.

* Only stop threads if they exist.

* Remove unnecessary try/excepts.

* Fix

* Only add new logging handler once.

* Increase timeout.

* Fix tempfile test.

* Fix logging in cluster_utils.

* Revert "Increase timeout."

This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.

* Retry longer when connecting to plasma store from node manager and object manager.

* Close pubsub channels to avoid leaking file descriptors.

* Limit log monitor open files to 200.

* Increase plasma connect retries.

* Add comment.
2019-02-07 19:53:50 -08:00
Philipp Moritz
0aa74fb1fd Update cloudpickle to 0.8.0.dev0 () 2019-02-07 15:24:06 -08:00
Eric Liang
ae4bc7d6e8
[revert] [rllib] Add copy() in async samples optimizer 2019-02-07 14:14:39 -08:00
markgoodhead
5ce670cb36 [tune] Add Initial Parameter Suggestion for HyperOpt ()
Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)
2019-02-07 10:57:51 -08:00
Ion
f987572795 Inline objects ()
* added store_client_ to object_manager and node_manager

* half through...

* all code in, and compiling! Nothing tested though...

* something is working ;-)

* added a few more comments

* now, add only one entry to the in GCS for inlined objects

* more comments

* remove a spurious todo

* some comment updates

* add test

* added support for meta data for inline objects

* avoid some copies

* Initialize plasma client in tests

* Better comments. Enable configuring nline_object_max_size_bytes.

* Update src/ray/object_manager/object_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* fiexed comments

* fixed various typos in comments

* updated comments in object_manager.h and object_manager.cc

* addressed all comments...hopefully ;-)

* Only add eviction entries for objects that are not inlined

* fixed a bunch of comments

* Fix test

* Fix object transfer dump test

* lint

* Comments

* Fix test?

* Fix test?

* lint

* fix build

* Fix build

* lint

* Use const ref

* Fixes, don't let object manager hang

* Increase object transfer retry time for travis?

* Fix test

* Fix test?

* Add internal config to java, fix PlasmaFreeTest
2019-02-07 10:32:39 -08:00
Richard Liaw
5db1afef07
[tune] Support Custom Resources ()
Support arbitrary resource declarations in Tune.

Fixes https://github.com/ray-project/ray/issues/2875
2019-02-07 00:29:19 -08:00
Robert Nishihara
a654152f9c Pin gym version in Python 2 tests. () 2019-02-06 23:56:14 -08:00
Philipp Moritz
3bb65677dc Use one memory mapped file for plasma () 2019-02-06 23:53:05 -08:00
Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 () ray-0.6.3 2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44 [autoscaler] Autoscaler hangs forever on non-zero exit code command () 2019-02-06 17:25:24 -08:00
Stephanie Wang
49e9bec988
Fix raylet bug in driver cleanup ()
* Fix task dependency manager cleanup on driver exit

* Add regression test

* Better check, update header
2019-02-06 11:19:10 -08:00
Stephanie Wang
244fd473f4
Only mark tasks as forwarded if they are in the lineage cache () 2019-02-05 23:01:38 -08:00
Alex LaGrassa
b0fe5af7c8 [doc] Update example-parameter-server.rst () 2019-02-05 22:00:54 -08:00
Robert Nishihara
fa4eb8313d Suppress warning for serializing different unique ID types in Python. ()
* Suppress warning for serializing different unique ID types in Python.

* Add _ID_TYPES variable.
2019-02-05 11:38:33 -08:00
vfdev
b2b8417790 [tune] Improve mnist_pytorch.py example ()
## What do these changes do?

* Improved --no-cuda handling
* Removed deprecated Variable usage


## Related issue number

Fixes  
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-02-04 17:59:54 -08:00
Eric Liang
5fb813ff39
Don't check fail on missing lineage cache entry () 2019-02-04 17:45:41 -08:00
William Ma
f067223c4a Allow Ray processes to be started inside of gdb and tmux. () 2019-02-04 15:23:39 -08:00
Yuhong Guo
add8ae7063 Add bazel build for JNI code ()
* Add bazel build for JNI code

* clean

* Add plasma client JNI build process

* refine

* clean linux part

* Add Java Library

* Remove java library

* Generate dylib after build using genrule
2019-02-04 13:03:46 -08:00
Wang Qing
e1c68a0881 Enable including Java worker for ray start command () 2019-02-04 16:23:43 +08:00
Eric Liang
7ef830bef1 [rllib] Add copy() in async samples optimizer to fix memory leak ()
Fixes .
2019-02-03 18:34:37 -08:00
Andrew Tan
8323419a6d [tune] Add SigOpt Integration () 2019-02-03 18:23:57 -08:00
Kristian Hartikainen
85294fb503 [autoscaler] node caching changes ()
Breaks the node provider node getter into cached and non-cached versions.

Fixes  by updating the node label finger print before updating labels.
Fixes  by refreshing node cache if node ip is not found.
2019-02-03 17:48:07 -08:00
James Casbon
976f018dab [autoscaler] GCP: only call setIamPolicy if necessary () 2019-02-03 16:16:00 -08:00
James Casbon
b8cc176b4d [autoscaler] Document gcp subnet config ()
Adds info to the gcp example yaml on using shared subnets.
2019-02-03 16:14:44 -08:00
Si-Yuan
9295ab8f60 Various Python code cleanups. () 2019-02-03 10:16:24 -08:00
Devin Petersohn
a1bcd2a4f5 Update Modin to 0.3.0 () 2019-02-02 23:06:16 -08:00
Michael Luo
1a015e420b Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented () 2019-02-02 22:10:58 -08:00
Richard Liaw
eab6dd72b5
[tune] logging fixes, better warnings, better cluster support () 2019-02-02 19:14:03 -08:00
Luke
002531b199 Enable LZ4 compression in pyarrow build ()
Enable LZ4 compression in pyarrow build
2019-02-02 14:38:02 -08:00
Yuhong Guo
54cbb4396f Prepare socket file when start ray () 2019-02-02 12:53:36 +08:00
Eric Liang
0f81bc9a33 [rllib] on_train_result results do not get logged () 2019-02-01 20:32:07 -08:00
Robert Nishihara
e0f82fd260 Fix building python 3.7 wheel by installing newer numpy. () 2019-02-01 18:06:48 -08:00
Daniel Edgecumbe
315edab085 [autoscaler] Speedups ()
- NodeUpdater gets its' IP in parallel now (no longer in __init__)
- We use persistent connections in SSH (temp folder created only for ray; ControlMaster)
- hash_runtime_conf was performing a pointless hexlify step, wasting time on large files
- We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed
- AWSNodeProvider caches nodes more aggressively
- NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it
- AWSNodeProvider batches EC2 update_tags calls
- Logging changes throughout to provide standardised timing information for profiling
- Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway)

## Related issue number
Issue 
2019-02-01 02:46:32 -08:00
Daniel Edgecumbe
ff3c6af1d6 [autoscaler]: Remove assertion in info string ()
Fixes 
2019-02-01 00:32:24 -08:00
Tianming Xu
1302fafc0b [Tune] Add export_formats option to export policy graphs ()
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.

For Ray Tune users, these APIs are not accessible through YAML configurations.

In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Kristian Hartikainen
b9eed2e86c [autoscaler] Move attach helper text under exec_cluster ()
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Peter Schafhalter
62a0a7bdc7 [tune] Add BayesOpt ()
Adds BayesOpt as a Tune suggestion algorithm.
2019-01-31 16:54:17 -08:00
Jimpachnet
d3551dd8df [tune] Added possibility to execute infinite recovery retries for a trial ()
Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.
2019-01-31 02:21:16 -08:00
Philipp Moritz
beb75193da Fix linting on master () 2019-01-31 01:28:45 -08:00
Richard Liaw
d128636bab Ray Logging Configuration ()
* fix logging for autoscaler

* module logging

* try this for logging

* yapf

* fix

* Initial logging setup

* momery

* ok

* remove basicconfig

* catch

* remove package logging

* print

* fix

* try_fix

* fix 1

* revert rllib

* logging level

* flake8

* fix

* fix

* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Richard Liaw
5f145041ef Update Release Docs () 2019-01-30 19:37:48 -08:00
Robert Nishihara
93214891b0 Small improvement to kubernetes config files. () 2019-01-30 18:00:20 -08:00