Commit graph

55 commits

Author SHA1 Message Date
Maltimore
0ec613c95a [rllib] doc: fix typo: on_postprocess_batch -> on_postprocess_traj (#6438) 2019-12-11 15:00:53 -08:00
Eric Liang
bc5e259264
[rllib] Add a doc section on computing actions (#6326)
* options doc

* add note

* hint shr

* doc update
2019-12-03 00:10:50 -08:00
Eric Liang
e4565c9cc6
Reduce RLlib log verbosity (#6154) 2019-11-13 18:50:45 -08:00
David Bignell
3f83b2daa9 [rllib] Rollout extensions (#6065)
* Rollout improvements

* Make info-saving optional, to avoid breaking change.

* Store generating ray version in checkpoint metadata

* Keep the linter happy

* Add small rollout test

* Terse.

* Update test_io.py
2019-11-05 20:34:18 -08:00
gehring
8903bcd0c3 [rllib] Tracing for eager tensorflow policies with tf.function (#5705)
* Added tracing of eager policies with `tf.function`

* lint

* add config option

* add docs

* wip

* tracing now works with a3c

* typo

* none

* file doc

* returns

* syntax error

* syntax error
2019-09-17 01:44:20 -07:00
Eric Liang
74abeab057
[rllib] Improve accessing model state docs (#5656)
* [rllib] better model docs

* fix

* s
2019-09-08 23:01:26 -07:00
Eric Liang
1455a19c85
Consolidate and clean up documentation (#5645) 2019-09-07 11:50:18 -07:00
Richard Liaw
34f6d2fc5c [tune] Update trainable docs and support hparams (#5558) 2019-09-04 12:44:42 -07:00
Eric Liang
daf38c8723
[tune] Deprecate tune.function (#5601)
* remove tune function

* remove examples

* Update tune-usage.rst
2019-08-31 16:00:10 -07:00
Eric Liang
550c96b965 [rllib] Add docs on policy.model (#5597) 2019-08-30 21:10:42 -07:00
Eric Liang
7d28bbbdbb
[rllib] Document on traj postprocess (#5532)
* document on traj postprocess

* shorten it
2019-08-24 20:37:45 -07:00
gehring
b520f6141e [rllib] Adds eager support with a generic TFEagerPolicy class (#5436) 2019-08-23 14:21:11 +08:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00
Richard Liaw
1eaa57c98f
[tune] Distributed example + walkthrough (#5157) 2019-08-02 09:17:20 -07:00
Kristian Hartikainen
13fb9fe3db [rllib] Feature/soft actor critic v2 (#5328)
* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
2019-08-01 23:37:36 -07:00
Eric Liang
20450a4e82
[rllib] Add rock paper scissors multi-agent example (#5336) 2019-08-01 13:03:59 -07:00
Eric Liang
9e328fbe6f
[rllib] Add docs on how to use TF eager execution (#4927) 2019-06-07 16:42:37 -07:00
Eric Liang
7501ee51db
[rllib] Rename PolicyEvaluator => RolloutWorker (#4820) 2019-06-03 06:49:24 +08:00
Eric Liang
4f46d3e9bf
[rllib] Add multi-agent examples for hand-coded policy, centralized VF (#4554) 2019-04-09 00:36:49 -07:00
Eric Liang
37208216ae
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00
Eric Liang
fce0062380
[rllib] Switch to tune.run() instead of run_experiments() (#4515) 2019-03-30 14:07:50 -07:00
Eric Liang
cff08e19ff
[rllib] Print out intermediate data shapes on the first iteration (#4426) 2019-03-26 00:27:59 -07:00
Eric Liang
4b8b703561
[rllib] Some API cleanups and documentation improvements (#4409) 2019-03-21 21:34:22 -07:00
Eric Liang
05d96ce81b
[rllib] Raise an error if multi-agent envs terminate without a last observation for agents (#4139)
* fix it

* lint

* Update rllib-training.rst
2019-02-23 21:23:40 -08:00
Eric Liang
c4182463f6
[rllib] Add helper to iterate over envs in a vectorized environment (#4001)
* add foreach env func

* fix

* add test
2019-02-11 10:40:47 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)
* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix
2019-01-29 21:06:09 -08:00
Eric Liang
e78562b2e8
[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697)
* fix

* update test

* better error

* compute

* eps fix

* add get_policy() api

* Update agent.py

* better err msg

* fix

* pass in rew
2019-01-06 19:37:35 -08:00
Eric Liang
b8a9e3f106
[rllib] Remove uses of sgd_stepsize => lr (#3667)
* lr

* Update example-evolution-strategies.rst
2019-01-01 12:01:27 +08:00
Richard Liaw
e046a5c767
[tune] resources_per_trial from trial_resources (#3580)
Renaming variable due to user errors.
2018-12-20 19:00:47 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
Eric Liang
d864f299d7
[rllib] fixes from dogfooding multi-agent (#3456)
auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
add some Q-learning debug stats
report min, max of custom metrics
better errors
2018-12-05 23:31:45 -08:00
Eric Liang
93a9d32288
[docs] Switch docs to use rllib train instead of train.py 2018-12-04 17:36:06 -08:00
Eric Liang
ce355d13d4
[rllib] Allow envs to be auto-registered; add on_train_result callback with curriculum example (#3451)
* train step and docs

* debug

* doc

* doc

* fix examples

* fix code

* integration test

* fix

* ...

* space

* instance

* Update .travis.yml

* fix test
2018-12-03 23:15:43 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384) 2018-11-27 23:35:19 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers (#3327) 2018-11-19 20:55:27 -08:00
Eric Liang
65c27c70cf [rllib] Clean up agent resource configurations (#3296)
Closes #3284
2018-11-13 18:00:03 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv (#3302) 2018-11-12 16:31:27 -08:00
eugenevinitsky
344b4ef0ff [rllib] Fix filter sync for ES and ARS (#2918) 2018-11-06 19:09:34 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics (#3144) 2018-11-03 18:48:32 -07:00
Eric Liang
af0c1174cd
[sgd] Merge sharded param server based SGD implementation (#3033)
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.

$ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
  --devices-per-worker=M --strategy=<simple|ps> \
  --warmup --object-store-memory=10000000000

Images per second total
gpus total              | simple | ps
========================================
1                       | 218
2 (1 worker)            | 388
4 (1 worker)            | 759
4 (2 workers)           | 176    | 623
8 (1 worker)            | 985
8 (2 workers)           | 349    | 1031
16 (2 nodes, 2 workers) | 600    | 1661
16 (2 nodes, 4 workers) | 468    | 1712   <--- OSDI perf was 1817
2018-10-27 21:25:02 -07:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00
Eric Liang
814c35b7d7
[rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995)
* simplify vec batch requirements

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-models.rst
2018-09-30 18:36:22 -07:00
Eric Liang
3cde5957b3
[rllib] Better document APIs to access policy state (#2932)
* fix

* doc

* example

* up
2018-09-24 19:08:32 -07:00
Eric Liang
995ac24a2c
[rllib] clarify train batch size for PPO (#2793)
It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.
2018-09-05 12:06:13 -07:00
Eric Liang
df4788e501
[rllib/tune] Add test for fractional gpu support in xray mode; add rllib support for fractional gpu (#2768)
* frac gpu

* doc

* Update rllib-training.rst

* yapf

* remove xray
2018-09-03 11:12:23 -07:00
Eric Liang
69d1354016
[rllib] Document ARS & rainbow (#2744)
* wip

* rainbow doc too

* e not used

* fix ppo doc

* clean list

* use same title
2018-08-28 18:13:36 -07:00
Eric Liang
aa014af85b
[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala (#2700)
Changes needed to reproduce Atari plots in IMPALA / A2C: https://github.com/ray-project/rl-experiments
2018-08-23 17:49:10 -07:00
Eric Liang
fbe6c59f72
[rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Richard Liaw
62d0698097
[tune] Tune Facelift (#2472)
This PR introduces the following changes:

 * Ray Tune -> Tune 
 * [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
 * [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
 * Support `"run": (function | class | str)` with automatic registering of trainable
 * Documentation Changes
2018-08-19 11:00:55 -07:00
Eric Liang
53f9755594
[rllib] Fix support for mixed discrete and continuous action spaces, add to regression test (#2655)
* fix

* lint

* fix
2018-08-15 10:19:41 -07:00