1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-16 08:06:38 -04:00
Commit graph

5656 commits

Author SHA1 Message Date
Stephanie Wang
4347ab644e
Use Redis lists in the GCS instead of zset ()
* Convert zset to list

* Remove object evictions map from the object directory, yay

* comments

* Fix tests
2019-02-13 10:32:57 -08:00
bjg2
0e37ac6d1d [wingman -> rllib] Remote and entangled environments ()
* added all our environment changes

* fixed merge request comments and remote env

* fixed remote check

* moved remote_worker_envs to correct config section

* lint

* auto wrap impl

* fix

* fixed the tests
2019-02-13 10:08:26 -08:00
Philipp Moritz
b3f72e8a75 Add regression tests for dataclass serialization () 2019-02-13 09:07:03 -08:00
Hao Chen
f31a79f3f7
Implement actor checkpointing ()
* Implement Actor checkpointing

* docs

* fix

* fix

* fix

* move restore-from-checkpoint to HandleActorStateTransition

* Revert "move restore-from-checkpoint to HandleActorStateTransition"

This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.

* resubmit waiting tasks when actor frontier restored

* add doc about num_actor_checkpoints_to_keep=1

* add num_actor_checkpoints_to_keep to Cython

* add checkpoint_expired api

* check if actor class is abstract

* change checkpoint_ids to long string

* implement java

* Refactor to delay actor creation publish until checkpoint is resumed

* debug, lint

* Erase from checkpoints to restore if task fails

* fix lint

* update comments

* avoid duplicated actor notification log

* fix unintended change

* add actor_id to checkpoint_expired

* small java updates

* make checkpoint info per actor

* lint

* Remove logging

* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager

* Replace old actor checkpointing tests

* Fix test and lint

* address comments

* consolidate kill_actor

* Remove __ray_checkpoint__

* fix non-ascii char

* Loosen test checks

* fix java

* fix sphinx-build
2019-02-13 19:39:02 +08:00
Andrew Tan
57dcd3033e [tune] Trial reporter fix ()
Fixes .
2019-02-13 01:03:54 -08:00
Wang Qing
3a7fb182cc Change the num of parallel jobs when building 2019-02-13 00:33:05 -08:00
William Ma
e1a479b137 Add teardown_module to test_queue.py () 2019-02-12 22:43:09 -08:00
Si-Yuan
21472b890a Integrate "tempfile_service" into "ray.node.Node" () 2019-02-12 17:34:04 -08:00
Adi Zimmerman
dac1969647 [tune] Add Nevergrad to Tune () 2019-02-12 11:00:04 -08:00
Wang Qing
c523bc04ad Enable redis password in Java worker ()
* Support Java redis password

* Fix

* Refine

* Fix lint.
2019-02-12 13:11:25 +08:00
Adi Zimmerman
9797028a91 [tune] Add scikit-optimize to Tune () 2019-02-11 17:06:02 -08:00
Eric Liang
8df772867c
[rllib] rename compute_apply to learn_on_batch 2019-02-11 15:22:15 -08:00
Eric Liang
c4182463f6
[rllib] Add helper to iterate over envs in a vectorized environment ()
* add foreach env func

* fix

* add test
2019-02-11 10:40:47 -08:00
Daniel Edgecumbe
a70ae1687b .gitignore: Add Vim swap files () 2019-02-11 10:27:10 -08:00
Ion
3c32343c63 Ray signal () 2019-02-11 10:14:48 -08:00
ebrevdo
52dfde1cbb Update flatbuffer bazel rule to work with flatbuffer master branch. () 2019-02-11 10:00:06 -08:00
Zhijun Fu
7097ba393b protect raylet against bad messages ()
* protect raylet against bad messages

* address comments

* linting and regression test
2019-02-12 00:39:38 +08:00
Wang Qing
bc438ca73b [Java] Refine Java config item ()
* Refine

* Address comment.
2019-02-11 23:55:40 +08:00
Philipp Moritz
ab809bd927 update ray version to 0.7.0dev () 2019-02-10 19:56:42 -08:00
Eric Liang
8e9f2c923f
[autoscaler] Use RLock in addition to FileLock 2019-02-10 19:16:43 -08:00
Yuhong Guo
5fb1efd60d Fix CI test failures () 2019-02-11 11:01:14 +08:00
bjg2
e703b9f49d [wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer ()
* added stats changes to optimizer

* changes timers

* fix python 2 compat

* improved optimizer throughput stats

* Update async_samples_optimizer.py

* fix python2 compat
2019-02-10 01:25:22 -08:00
Yuhong Guo
3a66d47a3a
Remove RAY_CHECK from JNI code ()
* Remove RAY_CHECK in JNI

* Try to add mvn test to test the exception.

* Refine

* Address comments
2019-02-09 18:10:22 +08:00
bibabolynn
728031a972 [java] when put an object in plasma store, ignore "object alreay exists" exception ()
* distinct plasma client exception

* Update ObjectStoreProxy.java

* Update and rename PlasmaArrowTest.java to PlasmaStoreTest.java

* store put

* Use testng to replace junit to fix test failure
2019-02-09 18:03:17 +08:00
Eric Liang
29322c7389
[rllib] Replay buffer for IMPALA should default to 0 slots. ()
* disable replay

* make lq configurable

* leak test

* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Robert Nishihara
6a32b410bb Update versions from 0.6.2 -> 0.6.3 in the documentation. () 2019-02-07 20:57:37 -08:00
Robert Nishihara
ef527f84ab Stream logs to driver by default. ()
* Stream logs to driver by default.

* Fix from rebase

* Redirect raylet output independently of worker output.

* Fix.

* Create redis client with services.create_redis_client.

* Suppress Redis connection error at exit.

* Remove thread_safe_client from redis.

* Shutdown driver threads in ray.shutdown().

* Add warning for too many log messages.

* Only stop threads if worker is connected.

* Only stop threads if they exist.

* Remove unnecessary try/excepts.

* Fix

* Only add new logging handler once.

* Increase timeout.

* Fix tempfile test.

* Fix logging in cluster_utils.

* Revert "Increase timeout."

This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.

* Retry longer when connecting to plasma store from node manager and object manager.

* Close pubsub channels to avoid leaking file descriptors.

* Limit log monitor open files to 200.

* Increase plasma connect retries.

* Add comment.
2019-02-07 19:53:50 -08:00
Philipp Moritz
0aa74fb1fd Update cloudpickle to 0.8.0.dev0 () 2019-02-07 15:24:06 -08:00
Eric Liang
ae4bc7d6e8
[revert] [rllib] Add copy() in async samples optimizer 2019-02-07 14:14:39 -08:00
markgoodhead
5ce670cb36 [tune] Add Initial Parameter Suggestion for HyperOpt ()
Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)
2019-02-07 10:57:51 -08:00
Ion
f987572795 Inline objects ()
* added store_client_ to object_manager and node_manager

* half through...

* all code in, and compiling! Nothing tested though...

* something is working ;-)

* added a few more comments

* now, add only one entry to the in GCS for inlined objects

* more comments

* remove a spurious todo

* some comment updates

* add test

* added support for meta data for inline objects

* avoid some copies

* Initialize plasma client in tests

* Better comments. Enable configuring nline_object_max_size_bytes.

* Update src/ray/object_manager/object_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* fiexed comments

* fixed various typos in comments

* updated comments in object_manager.h and object_manager.cc

* addressed all comments...hopefully ;-)

* Only add eviction entries for objects that are not inlined

* fixed a bunch of comments

* Fix test

* Fix object transfer dump test

* lint

* Comments

* Fix test?

* Fix test?

* lint

* fix build

* Fix build

* lint

* Use const ref

* Fixes, don't let object manager hang

* Increase object transfer retry time for travis?

* Fix test

* Fix test?

* Add internal config to java, fix PlasmaFreeTest
2019-02-07 10:32:39 -08:00
Richard Liaw
5db1afef07
[tune] Support Custom Resources ()
Support arbitrary resource declarations in Tune.

Fixes https://github.com/ray-project/ray/issues/2875
2019-02-07 00:29:19 -08:00
Robert Nishihara
a654152f9c Pin gym version in Python 2 tests. () 2019-02-06 23:56:14 -08:00
Philipp Moritz
3bb65677dc Use one memory mapped file for plasma () 2019-02-06 23:53:05 -08:00
Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 () 2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44 [autoscaler] Autoscaler hangs forever on non-zero exit code command () 2019-02-06 17:25:24 -08:00
Stephanie Wang
49e9bec988
Fix raylet bug in driver cleanup ()
* Fix task dependency manager cleanup on driver exit

* Add regression test

* Better check, update header
2019-02-06 11:19:10 -08:00
Stephanie Wang
244fd473f4
Only mark tasks as forwarded if they are in the lineage cache () 2019-02-05 23:01:38 -08:00
Alex LaGrassa
b0fe5af7c8 [doc] Update example-parameter-server.rst () 2019-02-05 22:00:54 -08:00
Robert Nishihara
fa4eb8313d Suppress warning for serializing different unique ID types in Python. ()
* Suppress warning for serializing different unique ID types in Python.

* Add _ID_TYPES variable.
2019-02-05 11:38:33 -08:00
vfdev
b2b8417790 [tune] Improve mnist_pytorch.py example ()
## What do these changes do?

* Improved --no-cuda handling
* Removed deprecated Variable usage


## Related issue number

Fixes  
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-02-04 17:59:54 -08:00
Eric Liang
5fb813ff39
Don't check fail on missing lineage cache entry () 2019-02-04 17:45:41 -08:00
William Ma
f067223c4a Allow Ray processes to be started inside of gdb and tmux. () 2019-02-04 15:23:39 -08:00
Yuhong Guo
add8ae7063 Add bazel build for JNI code ()
* Add bazel build for JNI code

* clean

* Add plasma client JNI build process

* refine

* clean linux part

* Add Java Library

* Remove java library

* Generate dylib after build using genrule
2019-02-04 13:03:46 -08:00
Wang Qing
e1c68a0881 Enable including Java worker for ray start command () 2019-02-04 16:23:43 +08:00
Eric Liang
7ef830bef1 [rllib] Add copy() in async samples optimizer to fix memory leak ()
Fixes .
2019-02-03 18:34:37 -08:00
Andrew Tan
8323419a6d [tune] Add SigOpt Integration () 2019-02-03 18:23:57 -08:00
Kristian Hartikainen
85294fb503 [autoscaler] node caching changes ()
Breaks the node provider node getter into cached and non-cached versions.

Fixes  by updating the node label finger print before updating labels.
Fixes  by refreshing node cache if node ip is not found.
2019-02-03 17:48:07 -08:00
James Casbon
976f018dab [autoscaler] GCP: only call setIamPolicy if necessary () 2019-02-03 16:16:00 -08:00
James Casbon
b8cc176b4d [autoscaler] Document gcp subnet config ()
Adds info to the gcp example yaml on using shared subnets.
2019-02-03 16:14:44 -08:00