Commit graph

1827 commits

Author SHA1 Message Date
Robert Nishihara
800f7cc77d Make actor handles work in Python mode. (#2283)
* Make actor handles work in local mode.

* Add test for actor handles in local mode.
2018-06-20 23:02:41 -07:00
Robert Nishihara
ff2217251f [xray] Add error table and push error messages to driver through node manager. (#2256)
* Fix documentation indentation.

* Add error table to GCS and push error messages through node manager.

* Add type to error data.

* Linting

* Fix failure_test bug.

* Linting.

* Enable one more test.

* Attempt to fix doc building.

* Restructuring

* Fixes

* More fixes.

* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00
Kunal Gosar
6bf48f47bc addressing comments (#2210) 2018-06-20 16:24:37 -07:00
Robert Nishihara
18ee044f03 Re-enable some actor tests. (#2276) 2018-06-20 14:42:35 -07:00
Zongheng Yang
8190ff1fd0 Experimental: enable automatic GCS flushing with configurable policy. (#2266)
* build_credis.sh: use an up-to-date credis commit.

* build_credis.sh: leveldb is updated, so update build cmds for it

* WIP: make monitor.py issue flush; switch gcs client to use credis

* Experimental: enable automatic GCS flushing with configurable policy.

* Fix linux compilation error

* Fix leveldb build

* Use optimized build for credis

* Address comments

* Attempt to fix tests
2018-06-20 14:40:57 -07:00
Melih Elibol
60bc3a014f [xray] Sets good object manager defaults. (#2255)
* better object manager defaults. added max for number of chunks.

* change source of cores.
2018-06-20 14:10:57 -07:00
Richard Liaw
4acb77a5c3
[tune] Update Trainable doc to expose interface (#2272) 2018-06-20 13:40:45 -07:00
Eric Liang
e5724a9cfe
[rllib] Add a simple REST policy server and client example (#2232)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* policy serve

* spaces

* checkpoint

* no train

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* fix race condition

* update

* com

* updat

* add test

* Update run_multi_node_tests.sh

* use curl

* curl

* kill

* Update run_multi_node_tests.sh

* Update run_multi_node_tests.sh

* fix import

* update
2018-06-20 13:22:39 -07:00
Richard Liaw
418cd6804a
[asv] Pushing to s3 (#2246) 2018-06-20 10:43:44 -07:00
Eric Liang
30f7c08ca7
[rllib] Remove need to pass around registry (#2250)
* remove registry

* fix

* too many _

* fix

* cloudpickle

* Update registry.py

* yapf

* fix test

* fix kv check
2018-06-19 22:47:00 -07:00
Adam Gleave
30684446a6 Support multiple availability zones in AWS (fix #2177) (#2254)
* AWS: support multiple availability zones (fix #2177)

* Bugfix: [] rather than ()

* Test config

* Test config tweaks

* Remove test config

* Formatting fixes

* Update YAML config
2018-06-19 20:22:07 -07:00
Eric Liang
46cc51ce0c
[rllib] Add squash_to_range model option (#2239)
* sigmoid

* squash

* squash true

* git push

* Update catalog.py
2018-06-19 19:47:26 -07:00
Yuhong Guo
51744459f3 Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (#2271) 2018-06-19 15:29:57 -07:00
Victor Sun
b372b7103e [rllib] Refactor Multi-GPU for PPO (#1646) 2018-06-18 20:49:35 -07:00
Eric Liang
7dee2c6735
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?

**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).

```
# CartPole-v0 on single core with 64x64 MLP:

# vector_width=1:
Actions per second 2720.1284458322966

# vector_width=8:
Actions per second 13773.035334888269

# vector_width=64:
Actions per second 37903.20472563333
```

**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.

**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).

Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
 ```
        gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
        rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
Kunal Gosar
8560993b46 [Dataframe] Change pandas and ray.dataframe imports (#1942)
* fixing zero length partitions

* fixing bugs to fully handle zero len parts

* resolve comments

* renaming imports
2018-06-15 16:17:16 -07:00
mylinyuzhi
fa0ade2bc5 [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (#2245)
* <feature> : serde lambda

* <feature>:fixed CR

with issue #2245

* <feature>: fixed CR
2018-06-13 12:58:07 -07:00
Philipp Moritz
62de86ff7a fix redis module build dependencies (#2247) 2018-06-13 10:18:09 -07:00
Hao Chen
8efd0f7b1b [xray] support multi-workers per process (#2244)
* support multi-workers per process

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* use RayConfig

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* fix

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* fix

* remove clear

* address comments

* fix lint

* fix bug

* make WorkerPool and WorkerPoolMock more consistent
2018-06-13 10:14:05 -07:00
songqing
78a48fa1e0 Fix build error when building Ray for Java later than Python (#2241) 2018-06-12 21:11:30 -07:00
Eric Liang
be178ae031 [autoscaler] GCP docs (#2235) 2018-06-12 12:40:12 -07:00
Eric Liang
7fcaad264a
[autoscaler] Translate to/from AWS 'Name' tag (#2219)
* fix tag

* fix
2018-06-11 12:10:10 -07:00
Alok Singh
d47d6a6b7a [rllib] Use correct method name (#2226) 2018-06-11 09:53:31 -07:00
Devin Petersohn
b886ceca47 [DataFrame] Implement __array_wrap__ (#2218)
* Implement __array_wrap__

* Removing unnecessary test
2018-06-11 08:56:43 -07:00
Robert Nishihara
61139e1509 Enable fractional resources and resource IDs for xray. (#2187)
* Implement GPU IDs and fractional resources.

* Add documentation and python exceptions.

* Fix signed/unsigned comparison.

* Fix linting.

* Fixes from rebase.

* Re-enable tests that use ray.wait.

* Don't kill the raylet if an infeasible task is submitted.

* Ignore tests that require better load balancing.

* Linting

* Ignore array test.

* Ignore stress test reconstructions tests.

* Don't kill node manager if remote node manager disconnects.

* Ignore more stress tests.

* Naming changes

* Remove outdated todo

* Small fix

* Re-enable test.

* Linting

* Fix resource bookkeeping for blocked tasks.

* Fix linting

* Fix Java client.

* Ignore test

* Ignore put error tests
2018-06-10 15:31:43 -07:00
Richard Liaw
f19decb848
[docs] Update RLlib install to not include Tensorflow (#2178) 2018-06-10 10:29:12 -07:00
Philipp Moritz
4ec5bea03b [xray] Implement fetch (#2195) 2018-06-09 23:36:27 -07:00
Robert Nishihara
125fe1c09c Print warning when defining very large remote function or actor. (#2179)
* Print warning when defining very large remote function or actor.

* Add weak test.

* Check that warnings appear in test.

* Make wait_for_errors actually fail in failure_test.py.

* Use constants for error types.

* Fix
2018-06-09 19:59:15 -07:00
andrewztan
1475600c81 [rllib] Merge DDPG and DDPG2 implementations (#2202)
* removed ddpg2

* removed ddpg2 from codebase

* added tests used in ddpg vs ddpg2 comparison

* added notes about training timesteps to yaml files

* removed ddpg2 yaml files

* removed unnecessary configs from yaml files

* removed unnecessary configs from yaml files

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* added more configuration details to yaml files

* removed random starts from halfcheetah
2018-06-09 16:46:23 -07:00
Yujie Liu
3b5e700fd7 [JavaWorker] Java code lint check and binding to CI (#2225)
* add java code lint check and fix the java code lint error

* add java doc lint check and fix the java doc lint error

* add java code and doc lint to the CI
2018-06-09 16:26:54 -07:00
Robert Nishihara
5789a247f9 [xray] Do not redirect worker output to files by default. (#2220) 2018-06-09 15:00:42 -07:00
Eric Liang
71eb558eb0 [rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00
Stephanie Wang
cb5e6e6d68 Add dependency between copy_ray and python extensions (#2221) 2018-06-08 20:41:54 -07:00
Eric Liang
32b9a4d3f1
Fix yapf excludes, print diff in --all mode (#2211)
* fix

* travis
2018-06-08 02:25:55 -07:00
Eric Liang
8da558f5b7 [autoscaler] Should use internal IP for ssh (#2209) 2018-06-08 01:08:59 -07:00
Eric Liang
31046f7e06 Autoscaler Python 2 queue fix (#2205) 2018-06-07 18:43:07 -07:00
Eric Liang
100d8c207f [xray] [autoscaler] Fix autoscaler / raylet integration (#2143) 2018-06-07 15:43:20 -07:00
Yuhong Guo
0a34bea0b0 Use scoped enums in C++ and flatbuffers. (#2194)
* Enable --scoped-enums in flatbuffer compiler.

* Change enum to c++11 style (enum class).

* Resolve conflicts.

* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.

* Merge with master and fix CI failure.
2018-06-07 01:01:21 -07:00
Hao Chen
f0907a6ee9 Optimize lineage eviction efficiency (#2196)
* Java in vscode.

* Optimize lineage eviction

* minor fix

* fix ut

* fix comment and lint

* format

* format

* remove unneeded code
2018-06-07 00:35:15 -07:00
Philipp Moritz
343f29801b [xray] Fix compilation on mac (#2199) 2018-06-06 22:33:46 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait (#2162)
Implements ray.wait for xray. Fixes #1128.
2018-06-06 16:56:44 -07:00
Devin Petersohn
c8c0349511 [DataFrame] Temporarily changing the requirement until our pandas compat is updated (#2197)
* Temporarily changing the requirement until our pandas compat is updated
for 0.23

* Fix lint
2018-06-06 12:01:43 -07:00
Yuhong Guo
5b0df0eca2 Change surefire version to 2.21.0 to fix test failure on Java10. (#2198) 2018-06-06 10:39:20 -07:00
Alok Singh
42a9233e1d Improve yapf speed and document its usage (#2160)
* Allow yapf to lint individual files

* Add tip for using yapf

* Update doc

* Update script to autoformat changed py files

The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.

* Exclude formatting thirdparty/autogen py files

* Symlink .travis -> scripts

Hidden directories may get glossed over otherwise.

* .travis -> scripts in docs

They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.

* Document different yapf format functions

Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.

* Speed up yapf by only formatting changed files

* Update docs

1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable

* Update yapf.sh

* Update development.rst

* Update yapf.sh

* Use bash arrays for correct argument splitting

Playing fast and loose with whitespace in bash is a terrible idea.

* Only format non-excluded by default

* Check changes against master

Normally, the remote is called `origin`, but naming it explicit

* Adding missing directory to `format_all`

* Cleanup YAPF code

Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.

* Ensure correct files are autoformatted

* Fix cmd line arg splitting

Each arg has to be in its own set of quotes.

* Diff against mergebase

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
2018-06-05 20:22:11 -07:00
Adam Gleave
6ef3b255ea Launch nodes in separate threads (#2183)
Modifies the autoscaler to run launch_new_nodes in a separate thread, keeping track of the number of pending requests.
2018-06-05 20:19:31 -07:00
Richard Liaw
13d4e0db95 Add Docker Support for ASV (#2184)
* added new instructions and script

* initialize ray only once

* use ray-project/asv master
2018-06-05 15:55:35 -07:00
Simon Mo
a139a5df8c [DataFrame] Implement Memoizer (#2157)
* Implement Memoizer

* Add LRUCache

* Add comments
2018-06-05 07:18:12 -07:00
songqing
451cdb43f6 Fix redefinition of flatbuffer types (#2189) 2018-06-05 00:08:05 -07:00
Devin Petersohn
b56c8ed8dc [DataFrame] Fix equals and make it more efficient (#2186)
* Fixing equals

* Adding test fix

* Working on fix for equals and drop

* Fix equals and fix tests to use ray.dataframe.equals

* Addressing comments
2018-06-04 13:10:06 -07:00
Peter Schafhalter
a5d888e49b [DataFrames] More dtypes optimizations (#2124)
* Pass dtypes for some DataFrame constructors

* More optimizations with dtypes_cache

* Optimizations
2018-06-04 10:50:13 -07:00