Commit graph

2419 commits

Author SHA1 Message Date
Robert Nishihara
01e18b47f4 Direct people to stackoverflow for questions about usage. (#3830)
* Direct people to stackoverflow for questions about usage.

* Improve wording
2019-01-23 13:30:02 -08:00
Wang Qing
dcb744518e Implement actor dummy object gc in java (#3822)
* Add dummy object gc in java

* Fix

* Address comments.

* Refine

* Address comments.
2019-01-23 11:56:25 -08:00
Wang Qing
816406ea3d [Java] Fix setCurrentTask() in multi threading (#3821) 2019-01-23 20:45:30 +08:00
Robert Nishihara
0b1608a546 Factor out code for starting new processes and test plasma store in valgrind. (#3824)
* Factor out starting Ray processes.

* Detect flags through environment variables.

* Return ProcessInfo from start_ray_process.

* Print valgrind errors at exit.

* Test valgrind in travis.

* Some valgrind fixes.

* Undo raylet monitor change.

* Only test plasma store in valgrind.
2019-01-22 14:59:11 -08:00
Eric Liang
f0e6523323
[rllib] Don't call reset() unless necessary for multi-agent envs 2019-01-20 15:00:18 -08:00
Philipp Moritz
0dad4e6a25 Build Raylet with Bazel (#3806) 2019-01-20 12:16:47 -08:00
Eric Liang
aad48ee5a5 [tune] Fully deprecate raw function literals in Tune (#3788)
Related: https://github.com/ray-project/ray/issues/3785
2019-01-19 17:09:36 -08:00
Michael Luo
16f7ca45e4 Appo (#3779)
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder

* Deleted unneccesary vtrace.py file

* Update pong-impala.yaml

* Cleaned PPO Code

* Update pong-impala.yaml

* Update pong-impala.yaml

* wip

* new ifle

* refactor

* add vtrace off option

* revert

* support any space

* docs

* fix comment

* remove kl

* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Philipp Moritz
931e6a2fc3 Fix compilation error on ARM. (#3800) 2019-01-18 00:25:16 -08:00
Robert Nishihara
9af5a62e05 Give better error for old-style actor classes. (#3793) 2019-01-17 19:05:04 -08:00
Richard Liaw
0537508106 Bump strings for 0.6.2 (#3801) 2019-01-17 19:03:27 -08:00
Si-Yuan
16a3b99d8d Get rid of Arrow test utils (#3734)
* convert code to proper C++

* revert changes to "id.h" because #3765 has been merged.

* revert changes to Python bindings because they will be removed in #3541

* remove dependencies of Arrow logging

* revert changes to Arrow logging

* lint
2019-01-17 18:35:41 -08:00
Jones Wong
319c1340cb [rllib] Develop MARWIL (#3635)
*  add marvil policy graph

*  fix typo

*  add offline optimizer and enable running marwil

*  fix loss function

*  add maintaining the moving average of advantage norm

*  use sync replay optimizer for unifying

*  remove offline optimizer and use sync replay optimizer

*  format by yapf

*  add imitation learning objective

*  fix according to eric's review

*  format by yapf

* revise

* add test data

* marwil
2019-01-16 19:00:43 -08:00
Hao Chen
d1840bc7a9 Simplify RayConfig (#3714) 2019-01-16 16:43:26 -08:00
Richard Liaw
75ac016e2b Bump version (#3787) 2019-01-16 11:40:54 -08:00
Richard Liaw
fa99fda2b4
Application Stress Tests (#3612) 2019-01-16 02:05:16 -08:00
Richard Liaw
c28e6d41f5
[tune] Avoid overwriting checkpoint file (#3781) 2019-01-16 02:03:16 -08:00
ggdupont
a237b4a6a1 [Java] Fix package jaxb not exist when JDK11 (#3738) 2019-01-16 14:15:00 +08:00
Philipp Moritz
3b39066c15 Fix pandas 0.22 incompatibility by upgrading Arrow (#3786) 2019-01-15 21:17:32 -08:00
Eric Liang
401e656b95 [rllib] Sync filters at end of iteration not start; hierarchical docs (#3769) 2019-01-15 16:25:25 -08:00
Richard Liaw
3918934dfd
[tune] Cross-Node Recovery (#3725)
Augments trial restore to also check if the runner is at the same
location. If not, the checkpoint files are pushed onto the new location.
2019-01-15 10:37:28 -08:00
Si-Yuan
a5df8e3532 minor fix (#3770) 2019-01-14 13:52:51 -08:00
Tianming Xu
0b8008f41c remove RAY_CHECK around wait_state.remaining.erase (#3745) 2019-01-14 10:32:31 -08:00
Philipp Moritz
02bdaf221d Update arrow to include https://github.com/apache/arrow/pull/3392 (#3765)
* update arrow to include https://github.com/apache/arrow/pull/3392

* add appropriate includes

* update
2019-01-14 19:20:26 +08:00
Wang Qing
3cf59855af [Java] Replace junit with testNG (#3768) 2019-01-14 17:49:17 +08:00
Robert Nishihara
19908c01b8 Use environment markers to only install faulthandler in Python < 3.3. (#3764) 2019-01-14 15:55:59 +08:00
Hao Chen
1bb20badec [Java] Fix bug when actor creation task fails (#3740)
* [Java] Fix bug when actor creation task fails

* remove imports
2019-01-14 11:09:15 +08:00
Robert Nishihara
27c20a41a9 Update stress tests. (#3614)
Starts clusters for testing and has a fallback to kill the cluster if the command fails.

The results are then printed at the end of test.
2019-01-13 17:08:51 -08:00
Eugene Vinitsky
a5d1f03515 [rllib] fix for rollout of lstm policies (#3643)
* fix for lstm policies

* added call to local evaluator

* Update python/ray/rllib/rollout.py

Co-Authored-By: eugenevinitsky <eugenevinitsky@users.noreply.github.com>

* Update rollout.py

* Update rollout.py
2019-01-13 15:54:23 -08:00
Philipp Moritz
00e9f8d870 Fix pyarrow version (#3760) 2019-01-13 14:28:23 -08:00
jhpenger
3adffe6a4e [docs] Add example showing how to use Ray on Kubernetes. (#3126)
Closes #1353.
2019-01-13 13:56:47 -08:00
Wang Qing
8674606e26 Support to auto-generate Java files from flatbuffer (#3749)
* auto gen flatbuffers for Java

* Add auto_gen_tool.py

* Refine

* Add a comment

* address comments.

* Address comments.

* Addressed

* Refine

* Address comments

* Fix typo

* Add exception

* Address comments.

* Refine

* Fix lint

* Fix

* Fix lint and address comment.

* Fix lint error
2019-01-13 11:39:23 -08:00
Yuhong Guo
d2cf8561f2 Refactor code about ray.ObjectID. (#3674)
* Refactor code about ray.ObjectID.

* remove from_random and use nil_id instead of constructor

* remove id() in hash

* Lint and fix

* Change driver id to ObjectID

* Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()
2019-01-13 01:47:29 -08:00
Eric Liang
c4b058739b Remove redundant error message (#3761) 2019-01-12 22:22:41 -08:00
Richard Liaw
bdeeacc70f
[autoscaler] RecoverUnhealthyWorker mitigation (#3699)
Increases number of retries for RecoverUnhealthyWorkers

Closes #3435.
2019-01-12 14:06:53 -08:00
Robert Nishihara
1480f309c3 [doc] Replace runtest.py with mini_test.py in documentation. (#3750)
Rename `xray_test.py` to `mini_test.py` and use that in the documentation. Right now we suggest that people run `runtest.py`, but that often doesn't succeed and takes too long.
2019-01-12 14:05:28 -08:00
James Casbon
528bb3afd9 gcp allow manual network configuration (#3748) 2019-01-12 14:02:20 -08:00
Robert Nishihara
fbea1ece2e Clear new actor handle list after submitting task. (#3755) 2019-01-12 23:25:40 +08:00
Wang Qing
0a556dc0b5 Refine redis client (#3758) 2019-01-12 23:01:48 +08:00
Wang Qing
a0cf8ee5a8 Refine Java worker code (#3735) 2019-01-12 22:45:33 +08:00
Robert Nishihara
8723d6b061 Define a Node class to manage Ray processes. (#3733)
* Implement Node class and move most of services.py into it.

* Wait for nodes as they are added to the cluster.

* Fix Redis authentication bug.

* Fix bug in client table ordering.

* Address comments.

* Kill raylet before plasma store in test.

* Minor
2019-01-11 22:30:38 -08:00
Wang Qing
fa2bfa6d76 Fix some small code quality issues. (#3719) 2019-01-11 15:24:49 +08:00
Stephanie Wang
cc5ecd71c5 [autoscaler] Add kill and get IP commands to CLI for testing (#3731)
## What do these changes do?

Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.

These commands are both for testing and are not recommended for normal use.

## Related issue number
Closes #3685.
2019-01-10 22:06:57 -08:00
Richard Liaw
574f0b73bc
[tune] Fix Trial Serialization (#3743) 2019-01-10 19:26:10 -08:00
Hao Chen
597abb24ea Refine multi-threading support (#3672)
* [Python] refine multi-threading support

fix

* [java] refine multithreading code

fix java

* format
2019-01-10 13:58:11 -08:00
Eric Liang
71243203a4
[rllib] Fix KeyError: 'kl' in multiagent ppo training 2019-01-09 19:33:07 -08:00
Hao Chen
6fc3fc4120 Cap task lease timeout (#3707) 2019-01-09 17:19:48 -08:00
Richard Liaw
edb7aaf7c7
[tune] Better Serialization for Server (#3708)
* Add cloudpickle for serialization

* Fix tests
2019-01-09 11:55:32 -08:00
Stephanie Wang
04f31db54d
Actor dummy object garbage collection (#3593)
* Convert UniqueID::nil() to a constructor

* Cleanup actor handle pickling code

* Add new actor handles to the task spec

* Pass in new actor handles

* Add new handles to the actor registration

* Regression test for actor handle forking and GC

* lint and doc

* Handle pickled actor handles in the backend and some refactoring

* Add regression test for dummy object GC and pickled actor handles

* Check for duplicate actor tasks on submission

* Regression test for forking twice, fix failed named actor leak

* Fix bug for forking twice

* lint

* Revert "Fix bug for forking twice"

This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac.

* Add new actor handles when task is assigned, not finished

* Remove comment

* remove UniqueID()

* Updates

* update

* fix

* fix java

* fixes

* fix
2019-01-09 10:37:11 -08:00
Wenting Shen
3027dde303 Fix some storage problems of RayLog (#3595)
1. Fix the problem of duplicated stored logs.
2. Save log whose level  is higher than severity_threshold, not only with severity_threshold.
3. Fix a `log_dir` bug: storing logs in a wrong path.
2019-01-09 13:54:21 +08:00