Commit graph

82 commits

Author SHA1 Message Date
Eric Liang
fe5bd09b46
Fix rllib image in readme and doc typo (#5579)
* fix

* rlllig
2019-08-29 16:02:16 -07:00
gehring
b520f6141e [rllib] Adds eager support with a generic TFEagerPolicy class (#5436) 2019-08-23 14:21:11 +08:00
Eric Liang
79949fb8a0
[rllib] RLlib in 60 seconds documentation (#5430) 2019-08-12 17:39:02 -07:00
Eric Liang
a1d2e17623
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00
Wonseok Jeon
281829e712 MADDPG implementation in RLlib (#5348) 2019-08-06 16:22:06 -07:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00
Kristian Hartikainen
13fb9fe3db [rllib] Feature/soft actor critic v2 (#5328)
* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
2019-08-01 23:37:36 -07:00
Eric Liang
20450a4e82
[rllib] Add rock paper scissors multi-agent example (#5336) 2019-08-01 13:03:59 -07:00
Eric Liang
a62c5f40f6
[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00
Eric Liang
9e328fbe6f
[rllib] Add docs on how to use TF eager execution (#4927) 2019-06-07 16:42:37 -07:00
Eric Liang
1c073e92e4
[rllib] Fix documentation on custom policies (#4910)
* wip

* add docs

* lint

* todo sections

* fix doc
2019-06-01 16:13:21 +08:00
Eric Liang
a45c61e19b
[rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (#4821)
* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info
2019-05-27 14:17:32 -07:00
Eric Liang
02583a8598 [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (#4819)
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang
6e7680bf21
[rllib] Clean up concepts documentation and policy optimizer creation (#4592) 2019-04-12 21:03:26 -07:00
Eric Liang
37208216ae
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00
Eric Liang
2871609296
[rllib] Report sampler performance metrics (#4427) 2019-03-27 13:24:23 -07:00
Eric Liang
4b8b703561
[rllib] Some API cleanups and documentation improvements (#4409) 2019-03-21 21:34:22 -07:00
Eric Liang
c7f74dbdc7
[rllib] Add async remote workers (#4253) 2019-03-08 15:39:48 -08:00
Eric Liang
78ad9c4cbb Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
Eric Liang
3896b726dd Dynamically adjust redis memory usage (#4152)
* f

* Update services.py
2019-02-25 16:21:37 -08:00
Eric Liang
d9da183c7d
[rllib] Custom supervised loss API (#4083) 2019-02-24 15:36:13 -08:00
Eric Liang
152375aa8a
[rllib] Add evaluation option to DQN agent (#3835)
* add eval

* interval

* multiagent minor fix

* Update rllib.rst

* Update ddpg.py

* Update qmix.py
2019-01-29 21:19:53 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)
* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix
2019-01-29 21:06:09 -08:00
Michael Luo
16f7ca45e4 Appo (#3779)
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder

* Deleted unneccesary vtrace.py file

* Update pong-impala.yaml

* Cleaned PPO Code

* Update pong-impala.yaml

* Update pong-impala.yaml

* wip

* new ifle

* refactor

* add vtrace off option

* revert

* support any space

* docs

* fix comment

* remove kl

* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Jones Wong
319c1340cb [rllib] Develop MARWIL (#3635)
*  add marvil policy graph

*  fix typo

*  add offline optimizer and enable running marwil

*  fix loss function

*  add maintaining the moving average of advantage norm

*  use sync replay optimizer for unifying

*  remove offline optimizer and use sync replay optimizer

*  format by yapf

*  add imitation learning objective

*  fix according to eric's review

*  format by yapf

* revise

* add test data

* marwil
2019-01-16 19:00:43 -08:00
Eric Liang
401e656b95 [rllib] Sync filters at end of iteration not start; hierarchical docs (#3769) 2019-01-15 16:25:25 -08:00
Eric Liang
03fe760616
[rllib] Model self loss isn't included in all algorithms (#3679) 2019-01-04 22:30:35 -08:00
Eric Liang
ca864faece
[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650) 2019-01-03 15:15:36 +08:00
Eric Liang
47d36d7bd6
[rllib] Refactor pytorch custom model support (#3634) 2019-01-03 13:48:33 +08:00
Eric Liang
9f63119a83
[rllib] Allow development without needing to compile Ray (#3623)
* wip

* lint

* wip

* wip

* rename

* wip

* Cleaner handling of cli prompt
2018-12-24 18:08:23 +09:00
Eric Liang
303883a3b6 [rllib] [rfc] add contrib module and guideline for merging (#3565)
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384) 2018-11-27 23:35:19 -08:00
Eric Liang
0d56fc10cc Move setproctitle to ray[debug] package (#3415) 2018-11-27 09:50:59 -08:00
Eric Liang
8b76bab25c
[rllib] docs for td3 (#3381)
* td3 doc

* Update rllib-env.rst
2018-11-22 13:36:47 -08:00
Eric Liang
706dc1d473
[rllib] Add test for multi-agent support and fix IMPALA multi-agent (#3289)
IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.

Fix this by adding zero-padding as needed (similar to the RNN case).
2018-11-14 14:14:07 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv (#3302) 2018-11-12 16:31:27 -08:00
Eric Liang
9dd3eedbac [rllib] rollout.py should reduce num workers (#3263)
## What do these changes do?

Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.

## Related issue number

Closes #3260.
2018-11-09 12:29:16 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics (#3144) 2018-11-03 18:48:32 -07:00
Eric Liang
9a0f0db070 Add ray stack tool for debugging (#3213) 2018-11-03 13:13:02 -07:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00
Eric Liang
b45bed4bce
[rllib] Propagate model options correctly in ARS / ES, to action dist of PPO (#2974)
* fix

* fix

* fix it

* propagate conf to action dist

* move carla example too

* rr

* Update policies.py

* wip

* lint
2018-10-01 12:49:39 -07:00
Eric Liang
69d1354016
[rllib] Document ARS & rainbow (#2744)
* wip

* rainbow doc too

* e not used

* fix ppo doc

* clean list

* use same title
2018-08-28 18:13:36 -07:00
Eric Liang
fbe6c59f72
[rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Eric Liang
5f430da180
[rllib] Provide internal access to episode state in compute_actions() and allow returning extra batches (#2559)
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.

  compute_actions() now takes a new keyword arg episodes
  pull out internal episode class into a top-level file
  add function to return extra trajectories from an episode that will be appended to the sample batch
  documentation
2018-08-16 14:37:21 -07:00
Eric Liang
079c4e482a
ray exec and ray attach commands (#2560)
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]

Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop

This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
2018-08-15 14:31:50 -07:00
Eric Liang
9ea57c2a93
[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) (#2504)
Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
  Add AsyncSamplesOptimizer that implements the IMPALA architecture
  integrate V-trace with a3c policy graph
  audit V-trace integration
  benchmark compare vs A3C and with V-trace on/off
PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.
2018-08-01 20:53:53 -07:00
Sergey Kolesnikov
05490b8cb9 [rllib] dqn/ddpg policy customization (#2445)
* dqn policy update - more customization

* docs for custom DQN graph

* Update rllib-training.rst

* Update rllib-models.rst

* Update rllib.rst

* Update rllib-training.rst

* Update rllib-concepts.rst

* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang
4ef9d15315
[rllib] Add concepts section of docs (#2373)
This fills in the rllib concepts documentation.
2018-07-08 18:46:52 -07:00
Eric Liang
8aa56c12e6
[rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00