Commit graph

644 commits

Author SHA1 Message Date
Yuhong Guo
1d51e57b6e Fix Plasma starting failure when specify the memory in float value. (#2337) 2018-07-04 13:35:51 -07:00
Robert Nishihara
1ede458519 Stop building wheels for Python 3.3 on Linux. (#2342)
* Stop building wheels for Python 3.3 on Linux.

* Fix test.
2018-07-04 12:22:33 -07:00
Zongheng Yang
ba28dddf6f Make xray object table credis-managed and hence flushable. (#2338)
* monitor.py: issue flushes to data shard

* ResultTableAdd & ObjectTableAdd: add credis-managed versions

* Fix return codes

* Credis-manage xray object table & associated ray.table_append cmd

* Fix incorrect return code from TableAppend_DoWrite()

* Revert "ResultTableAdd & ObjectTableAdd: add credis-managed versions"

This reverts commit 628c2ea190df4c861dda0c284fab7ca6faa1ea24.

* Address comments

* Lint: fix indent

* Address comment
2018-07-03 17:32:44 -07:00
Richard Liaw
178346fa16
Printing messages to stderr (#2312)
Move core python code onto logging module.

Addressing #1884.
2018-07-02 16:10:57 -07:00
Richard Liaw
f0ed1c1674
[rllib] Add more regression tests and autogenerate (#2324) 2018-07-02 08:20:53 -07:00
Eric Liang
8aa56c12e6
[rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00
Philipp Moritz
762bdf646e [xray] Put GCS data into the redis data shard (#2298) 2018-06-30 15:42:10 -10:00
Richard Liaw
d75b39f6df
[tune] Return error trials(#2292) 2018-06-28 20:23:38 -07:00
Hao Chen
20c0ecb522 Reuse code of checking large pickles (#2291) 2018-06-28 16:51:23 -10:00
Sergey Kolesnikov
cd63804768 [rllib] Different Activation Support (#2311) 2018-06-28 18:41:04 -07:00
Richard Liaw
3cc27d2840
[rllib][asv] Support ASV for RLlib (#2304) 2018-06-28 17:20:09 -07:00
Richard Liaw
92ab7e56ec [rllib] Fix PPO regression 2018-06-28 16:00:53 -07:00
Adam Gleave
89460b8d11 autoscaler: count head node, don't kill below target (fixes #2317) (#2320)
Specifically, subtracts 1 from the target number of workers, taking into
account that the head node has some computational resources.

Do not kill an idle node if it would drop us below the target number of
nodes (in which case we just immediately relaunch).
2018-06-28 15:33:51 -07:00
Richard Liaw
b4dff9f933
[rllib] PPO onto new RLlib APIs (#2270) 2018-06-28 09:49:08 -07:00
Eric Liang
b197c0c404
[rllib] General RNN support (#2299)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* fix catalog

* remove prep
2018-06-27 22:51:04 -07:00
Richard Liaw
d3f81d5aad
[rllib] Add stats for A3C (#2315)
* add stats for a3c again

* fix multigpu too
2018-06-27 22:41:34 -07:00
Eric Liang
737f3e3cf2
[tune] Fix registering trainable twice (#2293)
* register twice

* isolate

* Update registry.py

* Update registry.py
2018-06-27 16:29:39 -07:00
Eric Liang
44f5f0520b
[rllib] Rename optimizers for clarity (#2303)
* rename

* fix

* update

* mgpu

* Update a3c.py

* Update bc.py

* Update a3c.py

* Update test_optimizers.py

* Update a3c.py
2018-06-27 02:30:15 -07:00
Richard Liaw
e657497225
[xray] Fix tune tests (#2305)
* fix xray tests

* yapf

* unleash tests
2018-06-26 23:56:23 -07:00
Eric Liang
1251abf0d1
[rllib] Modularize Torch and TF policy graphs (#2294)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* cast

* clean up

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* clarify

* copy

* async sa

* fix
2018-06-26 13:17:15 -07:00
Eric Liang
a9a26b7560
[rllib] Part 2 of multiagent support (#2286)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* fix obs filter

* pass thru worker index

* fix

* fix log action

* debug name

* fix sphinx
2018-06-25 22:33:57 -07:00
Sergey Kolesnikov
739ddfa229 Fix APEX update target (#2300)
* apex hotfix

small hotfix for Apex work

* Also patch the dqn version
2018-06-25 13:05:27 -07:00
Eric Liang
0b6112b726
[rllib] Part 1 of multiagent support: make sampler path support multiagent envs (#2268)
This refactors the RLlib sampler to support multi-agent environments. The main changes were:

AsyncVectorEnv now produces dicts of env_id -> agent_id -> value rather than env_id -> value. This lets it model both vectorized and multi-agent envs (or both).
The sampler class operates over the above nested dict structure for all envs. Single agent envs just return a dict with one agent_id=single_agent.
When sample() is called on a policy evaluator, in the single agent case we return a SampleBatch, otherwise we return a MultiAgentBatch (which is a list of sample batches per policy).
Left for another PR:

Exposing multi-agent in the public interfaces.
Optimizations such as evaluating multiple policies in one TF run.
2018-06-23 18:32:16 -07:00
Eric Liang
9c3bab5c42
[tune] Support all serializable objects in config (#2287)
* wip

* order

* lint
2018-06-23 16:13:46 -07:00
Simon Mo
b108419628 Cast locator with index type (#2274) 2018-06-21 08:28:22 -07:00
Kunal Gosar
aa5daa1b82 fixing zero length partitions (#2237)
fixing bugs to fully handle zero len parts

resolve comments

renaming imports

Add getattr to groupby

testing
2018-06-21 08:25:32 -07:00
Robert Nishihara
800f7cc77d Make actor handles work in Python mode. (#2283)
* Make actor handles work in local mode.

* Add test for actor handles in local mode.
2018-06-20 23:02:41 -07:00
Robert Nishihara
ff2217251f [xray] Add error table and push error messages to driver through node manager. (#2256)
* Fix documentation indentation.

* Add error table to GCS and push error messages through node manager.

* Add type to error data.

* Linting

* Fix failure_test bug.

* Linting.

* Enable one more test.

* Attempt to fix doc building.

* Restructuring

* Fixes

* More fixes.

* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00
Kunal Gosar
6bf48f47bc addressing comments (#2210) 2018-06-20 16:24:37 -07:00
Zongheng Yang
8190ff1fd0 Experimental: enable automatic GCS flushing with configurable policy. (#2266)
* build_credis.sh: use an up-to-date credis commit.

* build_credis.sh: leveldb is updated, so update build cmds for it

* WIP: make monitor.py issue flush; switch gcs client to use credis

* Experimental: enable automatic GCS flushing with configurable policy.

* Fix linux compilation error

* Fix leveldb build

* Use optimized build for credis

* Address comments

* Attempt to fix tests
2018-06-20 14:40:57 -07:00
Richard Liaw
4acb77a5c3
[tune] Update Trainable doc to expose interface (#2272) 2018-06-20 13:40:45 -07:00
Eric Liang
e5724a9cfe
[rllib] Add a simple REST policy server and client example (#2232)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* policy serve

* spaces

* checkpoint

* no train

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* fix race condition

* update

* com

* updat

* add test

* Update run_multi_node_tests.sh

* use curl

* curl

* kill

* Update run_multi_node_tests.sh

* Update run_multi_node_tests.sh

* fix import

* update
2018-06-20 13:22:39 -07:00
Richard Liaw
418cd6804a
[asv] Pushing to s3 (#2246) 2018-06-20 10:43:44 -07:00
Eric Liang
30f7c08ca7
[rllib] Remove need to pass around registry (#2250)
* remove registry

* fix

* too many _

* fix

* cloudpickle

* Update registry.py

* yapf

* fix test

* fix kv check
2018-06-19 22:47:00 -07:00
Adam Gleave
30684446a6 Support multiple availability zones in AWS (fix #2177) (#2254)
* AWS: support multiple availability zones (fix #2177)

* Bugfix: [] rather than ()

* Test config

* Test config tweaks

* Remove test config

* Formatting fixes

* Update YAML config
2018-06-19 20:22:07 -07:00
Eric Liang
46cc51ce0c
[rllib] Add squash_to_range model option (#2239)
* sigmoid

* squash

* squash true

* git push

* Update catalog.py
2018-06-19 19:47:26 -07:00
Victor Sun
b372b7103e [rllib] Refactor Multi-GPU for PPO (#1646) 2018-06-18 20:49:35 -07:00
Eric Liang
7dee2c6735
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?

**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).

```
# CartPole-v0 on single core with 64x64 MLP:

# vector_width=1:
Actions per second 2720.1284458322966

# vector_width=8:
Actions per second 13773.035334888269

# vector_width=64:
Actions per second 37903.20472563333
```

**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.

**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).

Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
 ```
        gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
        rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
Kunal Gosar
8560993b46 [Dataframe] Change pandas and ray.dataframe imports (#1942)
* fixing zero length partitions

* fixing bugs to fully handle zero len parts

* resolve comments

* renaming imports
2018-06-15 16:17:16 -07:00
Eric Liang
7fcaad264a
[autoscaler] Translate to/from AWS 'Name' tag (#2219)
* fix tag

* fix
2018-06-11 12:10:10 -07:00
Alok Singh
d47d6a6b7a [rllib] Use correct method name (#2226) 2018-06-11 09:53:31 -07:00
Devin Petersohn
b886ceca47 [DataFrame] Implement __array_wrap__ (#2218)
* Implement __array_wrap__

* Removing unnecessary test
2018-06-11 08:56:43 -07:00
Robert Nishihara
61139e1509 Enable fractional resources and resource IDs for xray. (#2187)
* Implement GPU IDs and fractional resources.

* Add documentation and python exceptions.

* Fix signed/unsigned comparison.

* Fix linting.

* Fixes from rebase.

* Re-enable tests that use ray.wait.

* Don't kill the raylet if an infeasible task is submitted.

* Ignore tests that require better load balancing.

* Linting

* Ignore array test.

* Ignore stress test reconstructions tests.

* Don't kill node manager if remote node manager disconnects.

* Ignore more stress tests.

* Naming changes

* Remove outdated todo

* Small fix

* Re-enable test.

* Linting

* Fix resource bookkeeping for blocked tasks.

* Fix linting

* Fix Java client.

* Ignore test

* Ignore put error tests
2018-06-10 15:31:43 -07:00
Richard Liaw
f19decb848
[docs] Update RLlib install to not include Tensorflow (#2178) 2018-06-10 10:29:12 -07:00
Philipp Moritz
4ec5bea03b [xray] Implement fetch (#2195) 2018-06-09 23:36:27 -07:00
Robert Nishihara
125fe1c09c Print warning when defining very large remote function or actor. (#2179)
* Print warning when defining very large remote function or actor.

* Add weak test.

* Check that warnings appear in test.

* Make wait_for_errors actually fail in failure_test.py.

* Use constants for error types.

* Fix
2018-06-09 19:59:15 -07:00
andrewztan
1475600c81 [rllib] Merge DDPG and DDPG2 implementations (#2202)
* removed ddpg2

* removed ddpg2 from codebase

* added tests used in ddpg vs ddpg2 comparison

* added notes about training timesteps to yaml files

* removed ddpg2 yaml files

* removed unnecessary configs from yaml files

* removed unnecessary configs from yaml files

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* added more configuration details to yaml files

* removed random starts from halfcheetah
2018-06-09 16:46:23 -07:00
Robert Nishihara
5789a247f9 [xray] Do not redirect worker output to files by default. (#2220) 2018-06-09 15:00:42 -07:00
Eric Liang
71eb558eb0 [rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00
Eric Liang
8da558f5b7 [autoscaler] Should use internal IP for ssh (#2209) 2018-06-08 01:08:59 -07:00