Commit graph

62 commits

Author SHA1 Message Date
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503)
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
52cf77f5a9
[rllib] SAC no_done_at_end should default to False (#7594)
* update

* update doc

* stochastic

* cleanu
2020-03-14 11:16:54 -07:00
Sven Mika
2d97650b1e
[RLlib] Add Exploration API documentation. (#7373)
* Add Exploration API documentation.

* Add Exploration API documentation.

* Add Exploration API documentation.

* Update exporation docs.
2020-03-01 16:55:41 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
Eric Liang
fbc545c03b
[rllib] Support parallel, parameterized evaluation (#6981)
* eval api

* update

* sync eval filters

* sync fix

* docs

* update

* docs

* update

* link

* nit

* doc updates

* format
2020-02-01 22:12:12 -08:00
Eric Liang
e659699ca9
[tune] Fix directory naming regression (#6839) 2020-01-27 15:53:40 -08:00
Sven Mika
e6227082bd [RLlib] Add torch flag to train.py (#6807) 2020-01-17 18:48:44 -08:00
Maltimore
0ec613c95a [rllib] doc: fix typo: on_postprocess_batch -> on_postprocess_traj (#6438) 2019-12-11 15:00:53 -08:00
Eric Liang
bc5e259264
[rllib] Add a doc section on computing actions (#6326)
* options doc

* add note

* hint shr

* doc update
2019-12-03 00:10:50 -08:00
Eric Liang
e4565c9cc6
Reduce RLlib log verbosity (#6154) 2019-11-13 18:50:45 -08:00
David Bignell
3f83b2daa9 [rllib] Rollout extensions (#6065)
* Rollout improvements

* Make info-saving optional, to avoid breaking change.

* Store generating ray version in checkpoint metadata

* Keep the linter happy

* Add small rollout test

* Terse.

* Update test_io.py
2019-11-05 20:34:18 -08:00
gehring
8903bcd0c3 [rllib] Tracing for eager tensorflow policies with tf.function (#5705)
* Added tracing of eager policies with `tf.function`

* lint

* add config option

* add docs

* wip

* tracing now works with a3c

* typo

* none

* file doc

* returns

* syntax error

* syntax error
2019-09-17 01:44:20 -07:00
Eric Liang
74abeab057
[rllib] Improve accessing model state docs (#5656)
* [rllib] better model docs

* fix

* s
2019-09-08 23:01:26 -07:00
Eric Liang
1455a19c85
Consolidate and clean up documentation (#5645) 2019-09-07 11:50:18 -07:00
Richard Liaw
34f6d2fc5c [tune] Update trainable docs and support hparams (#5558) 2019-09-04 12:44:42 -07:00
Eric Liang
daf38c8723
[tune] Deprecate tune.function (#5601)
* remove tune function

* remove examples

* Update tune-usage.rst
2019-08-31 16:00:10 -07:00
Eric Liang
550c96b965 [rllib] Add docs on policy.model (#5597) 2019-08-30 21:10:42 -07:00
Eric Liang
7d28bbbdbb
[rllib] Document on traj postprocess (#5532)
* document on traj postprocess

* shorten it
2019-08-24 20:37:45 -07:00
gehring
b520f6141e [rllib] Adds eager support with a generic TFEagerPolicy class (#5436) 2019-08-23 14:21:11 +08:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00
Richard Liaw
1eaa57c98f
[tune] Distributed example + walkthrough (#5157) 2019-08-02 09:17:20 -07:00
Kristian Hartikainen
13fb9fe3db [rllib] Feature/soft actor critic v2 (#5328)
* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
2019-08-01 23:37:36 -07:00
Eric Liang
20450a4e82
[rllib] Add rock paper scissors multi-agent example (#5336) 2019-08-01 13:03:59 -07:00
Eric Liang
9e328fbe6f
[rllib] Add docs on how to use TF eager execution (#4927) 2019-06-07 16:42:37 -07:00
Eric Liang
7501ee51db
[rllib] Rename PolicyEvaluator => RolloutWorker (#4820) 2019-06-03 06:49:24 +08:00
Eric Liang
4f46d3e9bf
[rllib] Add multi-agent examples for hand-coded policy, centralized VF (#4554) 2019-04-09 00:36:49 -07:00
Eric Liang
37208216ae
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00
Eric Liang
fce0062380
[rllib] Switch to tune.run() instead of run_experiments() (#4515) 2019-03-30 14:07:50 -07:00
Eric Liang
cff08e19ff
[rllib] Print out intermediate data shapes on the first iteration (#4426) 2019-03-26 00:27:59 -07:00
Eric Liang
4b8b703561
[rllib] Some API cleanups and documentation improvements (#4409) 2019-03-21 21:34:22 -07:00
Eric Liang
05d96ce81b
[rllib] Raise an error if multi-agent envs terminate without a last observation for agents (#4139)
* fix it

* lint

* Update rllib-training.rst
2019-02-23 21:23:40 -08:00
Eric Liang
c4182463f6
[rllib] Add helper to iterate over envs in a vectorized environment (#4001)
* add foreach env func

* fix

* add test
2019-02-11 10:40:47 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)
* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix
2019-01-29 21:06:09 -08:00
Eric Liang
e78562b2e8
[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697)
* fix

* update test

* better error

* compute

* eps fix

* add get_policy() api

* Update agent.py

* better err msg

* fix

* pass in rew
2019-01-06 19:37:35 -08:00
Eric Liang
b8a9e3f106
[rllib] Remove uses of sgd_stepsize => lr (#3667)
* lr

* Update example-evolution-strategies.rst
2019-01-01 12:01:27 +08:00
Richard Liaw
e046a5c767
[tune] resources_per_trial from trial_resources (#3580)
Renaming variable due to user errors.
2018-12-20 19:00:47 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
Eric Liang
d864f299d7
[rllib] fixes from dogfooding multi-agent (#3456)
auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
add some Q-learning debug stats
report min, max of custom metrics
better errors
2018-12-05 23:31:45 -08:00
Eric Liang
93a9d32288
[docs] Switch docs to use rllib train instead of train.py 2018-12-04 17:36:06 -08:00
Eric Liang
ce355d13d4
[rllib] Allow envs to be auto-registered; add on_train_result callback with curriculum example (#3451)
* train step and docs

* debug

* doc

* doc

* fix examples

* fix code

* integration test

* fix

* ...

* space

* instance

* Update .travis.yml

* fix test
2018-12-03 23:15:43 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384) 2018-11-27 23:35:19 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers (#3327) 2018-11-19 20:55:27 -08:00
Eric Liang
65c27c70cf [rllib] Clean up agent resource configurations (#3296)
Closes #3284
2018-11-13 18:00:03 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv (#3302) 2018-11-12 16:31:27 -08:00
eugenevinitsky
344b4ef0ff [rllib] Fix filter sync for ES and ARS (#2918) 2018-11-06 19:09:34 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics (#3144) 2018-11-03 18:48:32 -07:00
Eric Liang
af0c1174cd
[sgd] Merge sharded param server based SGD implementation (#3033)
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.

$ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
  --devices-per-worker=M --strategy=<simple|ps> \
  --warmup --object-store-memory=10000000000

Images per second total
gpus total              | simple | ps
========================================
1                       | 218
2 (1 worker)            | 388
4 (1 worker)            | 759
4 (2 workers)           | 176    | 623
8 (1 worker)            | 985
8 (2 workers)           | 349    | 1031
16 (2 nodes, 2 workers) | 600    | 1661
16 (2 nodes, 4 workers) | 468    | 1712   <--- OSDI perf was 1817
2018-10-27 21:25:02 -07:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00
Eric Liang
814c35b7d7
[rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995)
* simplify vec batch requirements

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-models.rst
2018-09-30 18:36:22 -07:00
Eric Liang
3cde5957b3
[rllib] Better document APIs to access policy state (#2932)
* fix

* doc

* example

* up
2018-09-24 19:08:32 -07:00