Commit graph

2523 commits

Author SHA1 Message Date
William Ma
4be3d0c5d3 Update shipped modin to 0.3.1 (#4058) 2019-02-15 15:49:38 -08:00
Robert Nishihara
2d07df7f3f Replace '__main__' with "__main__". (#4055) 2019-02-15 13:32:43 -08:00
Robert Nishihara
5f71751891 API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025)
* Remove worker argument from API methods.

* Remove deprecated arguments and deprecate redirect_output and redirect_worker_output.

* Fix
2019-02-15 10:49:16 -08:00
Hao Chen
042ad84573
Simplify Cython ID types and fix bug of ActorCheckpointID (#4045) 2019-02-15 20:15:16 +08:00
Richard Liaw
bb7c4ce9c4
[tune] Improve error message when Ray crashes (#3795) 2019-02-15 01:04:17 -08:00
Richard Liaw
7cf62a10cd
[tune] Fix TF checkpointing example (#4043)
Closes #3912, closes #3963.
2019-02-15 00:30:27 -08:00
Stephanie Wang
3684e5bc0d Fix memory leak in Redis by using auto memory management (#4054)
* Table appends should always succeed

* Use Redis auto memory management

* Remove unneeded namespace
2019-02-14 19:51:18 -08:00
Eric Liang
0c0bd4d41c
[rllib] Use model.value_function() in MARWIL (#4036)
* fix marwil

* add ph

* fix
2019-02-14 19:35:21 -08:00
William Ma
8ee53297b1 Add documentation on how to use debug tools (#4000) 2019-02-14 13:50:21 -08:00
Philipp Moritz
077ffd99bf Bump version from 0.6.3 to 0.7.0.dev0 in docs and .yaml (#4042) 2019-02-14 12:08:48 -08:00
Yuhong Guo
4b0db437ee
Linting Bazel scripts (#4032)
* Use buildifier as bazel script linter

* Checkout golang version in travis

* Using golang-1.8-go in travis

* Add golang apt-repository

* Fix the bazel lint failure example.

* Address comment
2019-02-14 22:16:19 +08:00
Philipp Moritz
810cc17062 Fix LRU eviction of client notification datastructure (#4021)
* convert notification_key map to C++ datastructure

* fix crash and add debug string

* clean notification map up (this was a bug before)

* remove checks

* add jenkins test

* linting

* fixes

* properly erase

* clean up

* linting

* Update test_wait_hanging.py

* Update run_multi_node_tests.sh

* increase redis_max_memory

* fix dat jenkins

* update

* Update run_multi_node_tests.sh
2019-02-13 22:20:27 -08:00
Stephanie Wang
fd5b58a827 Increase timeout for object manager valgrind tests (#4027)
* Avoid second copy of data for inlined objects

* Increase Wait timeout for valgrind tests

* Run object manager tests with and without inlined objects

* Fix test
2019-02-13 18:29:03 -08:00
Wang Qing
1fb56a4316 Remove deprecated module (#4038) 2019-02-14 10:04:09 +08:00
Si-Yuan
2de31eb489 minor fix (#4040) 2019-02-13 17:22:45 -08:00
Eric Liang
2dccf383dd
[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941) 2019-02-13 16:25:05 -08:00
Kristian Hartikainen
729d0b2825 [autoscaler] docker run options (#3921)
Adds support for docker options, allowing for use of nvidia-docker.

Closes #2657.
2019-02-13 12:26:28 -08:00
Stephanie Wang
4347ab644e
Use Redis lists in the GCS instead of zset (#4023)
* Convert zset to list

* Remove object evictions map from the object directory, yay

* comments

* Fix tests
2019-02-13 10:32:57 -08:00
bjg2
0e37ac6d1d [wingman -> rllib] Remote and entangled environments (#3968)
* added all our environment changes

* fixed merge request comments and remote env

* fixed remote check

* moved remote_worker_envs to correct config section

* lint

* auto wrap impl

* fix

* fixed the tests
2019-02-13 10:08:26 -08:00
Philipp Moritz
b3f72e8a75 Add regression tests for dataclass serialization (#3984) 2019-02-13 09:07:03 -08:00
Hao Chen
f31a79f3f7
Implement actor checkpointing (#3839)
* Implement Actor checkpointing

* docs

* fix

* fix

* fix

* move restore-from-checkpoint to HandleActorStateTransition

* Revert "move restore-from-checkpoint to HandleActorStateTransition"

This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.

* resubmit waiting tasks when actor frontier restored

* add doc about num_actor_checkpoints_to_keep=1

* add num_actor_checkpoints_to_keep to Cython

* add checkpoint_expired api

* check if actor class is abstract

* change checkpoint_ids to long string

* implement java

* Refactor to delay actor creation publish until checkpoint is resumed

* debug, lint

* Erase from checkpoints to restore if task fails

* fix lint

* update comments

* avoid duplicated actor notification log

* fix unintended change

* add actor_id to checkpoint_expired

* small java updates

* make checkpoint info per actor

* lint

* Remove logging

* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager

* Replace old actor checkpointing tests

* Fix test and lint

* address comments

* consolidate kill_actor

* Remove __ray_checkpoint__

* fix non-ascii char

* Loosen test checks

* fix java

* fix sphinx-build
2019-02-13 19:39:02 +08:00
Andrew Tan
57dcd3033e [tune] Trial reporter fix (#3951)
Fixes #3949.
2019-02-13 01:03:54 -08:00
Wang Qing
3a7fb182cc Change the num of parallel jobs when building 2019-02-13 00:33:05 -08:00
William Ma
e1a479b137 Add teardown_module to test_queue.py (#4012) 2019-02-12 22:43:09 -08:00
Si-Yuan
21472b890a Integrate "tempfile_service" into "ray.node.Node" (#3953) 2019-02-12 17:34:04 -08:00
Adi Zimmerman
dac1969647 [tune] Add Nevergrad to Tune (#3985) 2019-02-12 11:00:04 -08:00
Wang Qing
c523bc04ad Enable redis password in Java worker (#3943)
* Support Java redis password

* Fix

* Refine

* Fix lint.
2019-02-12 13:11:25 +08:00
Adi Zimmerman
9797028a91 [tune] Add scikit-optimize to Tune (#3924) 2019-02-11 17:06:02 -08:00
Eric Liang
8df772867c
[rllib] rename compute_apply to learn_on_batch 2019-02-11 15:22:15 -08:00
Eric Liang
c4182463f6
[rllib] Add helper to iterate over envs in a vectorized environment (#4001)
* add foreach env func

* fix

* add test
2019-02-11 10:40:47 -08:00
Daniel Edgecumbe
a70ae1687b .gitignore: Add Vim swap files (#4016) 2019-02-11 10:27:10 -08:00
Ion
3c32343c63 Ray signal (#3624) 2019-02-11 10:14:48 -08:00
ebrevdo
52dfde1cbb Update flatbuffer bazel rule to work with flatbuffer master branch. (#4008) 2019-02-11 10:00:06 -08:00
Zhijun Fu
7097ba393b protect raylet against bad messages (#4003)
* protect raylet against bad messages

* address comments

* linting and regression test
2019-02-12 00:39:38 +08:00
Wang Qing
bc438ca73b [Java] Refine Java config item (#4014)
* Refine

* Address comment.
2019-02-11 23:55:40 +08:00
Philipp Moritz
ab809bd927 update ray version to 0.7.0dev (#3995) 2019-02-10 19:56:42 -08:00
Eric Liang
8e9f2c923f
[autoscaler] Use RLock in addition to FileLock 2019-02-10 19:16:43 -08:00
Yuhong Guo
5fb1efd60d Fix CI test failures (#4007) 2019-02-11 11:01:14 +08:00
bjg2
e703b9f49d [wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer (#3966)
* added stats changes to optimizer

* changes timers

* fix python 2 compat

* improved optimizer throughput stats

* Update async_samples_optimizer.py

* fix python2 compat
2019-02-10 01:25:22 -08:00
Yuhong Guo
3a66d47a3a
Remove RAY_CHECK from JNI code (#3978)
* Remove RAY_CHECK in JNI

* Try to add mvn test to test the exception.

* Refine

* Address comments
2019-02-09 18:10:22 +08:00
bibabolynn
728031a972 [java] when put an object in plasma store, ignore "object alreay exists" exception (#3687)
* distinct plasma client exception

* Update ObjectStoreProxy.java

* Update and rename PlasmaArrowTest.java to PlasmaStoreTest.java

* store put

* Use testng to replace junit to fix test failure
2019-02-09 18:03:17 +08:00
Eric Liang
29322c7389
[rllib] Replay buffer for IMPALA should default to 0 slots. (#3971)
* disable replay

* make lq configurable

* leak test

* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Robert Nishihara
6a32b410bb Update versions from 0.6.2 -> 0.6.3 in the documentation. (#3981) 2019-02-07 20:57:37 -08:00
Robert Nishihara
ef527f84ab Stream logs to driver by default. (#3892)
* Stream logs to driver by default.

* Fix from rebase

* Redirect raylet output independently of worker output.

* Fix.

* Create redis client with services.create_redis_client.

* Suppress Redis connection error at exit.

* Remove thread_safe_client from redis.

* Shutdown driver threads in ray.shutdown().

* Add warning for too many log messages.

* Only stop threads if worker is connected.

* Only stop threads if they exist.

* Remove unnecessary try/excepts.

* Fix

* Only add new logging handler once.

* Increase timeout.

* Fix tempfile test.

* Fix logging in cluster_utils.

* Revert "Increase timeout."

This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.

* Retry longer when connecting to plasma store from node manager and object manager.

* Close pubsub channels to avoid leaking file descriptors.

* Limit log monitor open files to 200.

* Increase plasma connect retries.

* Add comment.
2019-02-07 19:53:50 -08:00
Philipp Moritz
0aa74fb1fd Update cloudpickle to 0.8.0.dev0 (#3964) 2019-02-07 15:24:06 -08:00
Eric Liang
ae4bc7d6e8
[revert] [rllib] Add copy() in async samples optimizer 2019-02-07 14:14:39 -08:00
markgoodhead
5ce670cb36 [tune] Add Initial Parameter Suggestion for HyperOpt (#3944)
Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)
2019-02-07 10:57:51 -08:00
Ion
f987572795 Inline objects (#3756)
* added store_client_ to object_manager and node_manager

* half through...

* all code in, and compiling! Nothing tested though...

* something is working ;-)

* added a few more comments

* now, add only one entry to the in GCS for inlined objects

* more comments

* remove a spurious todo

* some comment updates

* add test

* added support for meta data for inline objects

* avoid some copies

* Initialize plasma client in tests

* Better comments. Enable configuring nline_object_max_size_bytes.

* Update src/ray/object_manager/object_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: istoica <istoica@cs.berkeley.edu>

* fiexed comments

* fixed various typos in comments

* updated comments in object_manager.h and object_manager.cc

* addressed all comments...hopefully ;-)

* Only add eviction entries for objects that are not inlined

* fixed a bunch of comments

* Fix test

* Fix object transfer dump test

* lint

* Comments

* Fix test?

* Fix test?

* lint

* fix build

* Fix build

* lint

* Use const ref

* Fixes, don't let object manager hang

* Increase object transfer retry time for travis?

* Fix test

* Fix test?

* Add internal config to java, fix PlasmaFreeTest
2019-02-07 10:32:39 -08:00
Richard Liaw
5db1afef07
[tune] Support Custom Resources (#2979)
Support arbitrary resource declarations in Tune.

Fixes https://github.com/ray-project/ray/issues/2875
2019-02-07 00:29:19 -08:00
Robert Nishihara
a654152f9c Pin gym version in Python 2 tests. (#3973) 2019-02-06 23:56:14 -08:00