Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers ( #8345 )
2020-05-21 10:16:18 -07:00
Sven Mika
d76578700d
[RLlib] Policy.compute_single_action()
broken for nested actions (Issue 8411). ( #8514 )
2020-05-20 22:29:08 +02:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala ( #8321 )
2020-05-11 20:24:43 -07:00
Eric Liang
f48da50e1c
[rllib] observation function api for multi-agent ( #8236 )
2020-05-04 22:13:49 -07:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. ( #8288 )
...
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Eric Liang
2a0ad0b8ce
[rllib] [hotfix] Remove assert that trips on pytorch multiagent ( #8241 )
2020-05-01 06:32:54 +02:00
Eric Liang
baadbdf8d4
[rllib] Execute PPO using training workflow ( #8206 )
...
* wip
* add kl
* kl
* works now
* doc update
* reorg
* add ddppo
* add stats
* fix fetch
* comment
* fix learner stat regression
* test fixes
* fix test
2020-04-30 01:18:09 -07:00
Sven Mika
1775e89f26
[RLlib] Remove TupleActions and support arbitrarily nested action spaces. ( #8143 )
...
Deprecate TupleActions and support arbitrarily nested action spaces.
Closes issue #8143 .
2020-04-28 14:59:16 +02:00
Sven Mika
4e713152e9
[RLlib] Fix for issue https://github.com/ray-project/ray/issues/8191 ( #8200 )
...
Fix attribute error when missing exploration in Policy.
Issue #8191
2020-04-27 23:19:26 +02:00
Sven Mika
e9ee5c4e5f
[RLlib] Nested action space PR (minimally invasive; torch only + test). ( #8101 )
...
- Add TorchMultiActionDistribution class.
- Add framework-agnostic test cases for TorchMultiActionDistribution.
2020-04-23 09:09:22 +02:00
roireshef
dbcad35022
[RLlib] Added DefaultCallbacks which replaces old callbacks dict interface ( #6972 )
2020-04-16 16:06:42 -07:00
Xianyang Liu
e1d3f7eba6
[rllib]Add config for rllib to support set python environments ( #8026 )
...
* support set extra python environments
* wrap value with str
* Apply suggestions from code review
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* addresses comments
* fix lint errors
* remove unrelated changes due to format.sh
* remove unrelated changes due to format.sh
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-16 01:13:45 -07:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. ( #7934 )
2020-04-09 14:04:21 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Eric Liang
630b3b1752
[rllib] set daemon status for PolicyServerInput thread ( #7862 )
2020-04-04 16:08:51 -07:00
Sven Mika
5537fe13b0
[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. ( #7814 )
2020-04-03 10:44:25 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. ( #7798 )
...
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
e356e97eb2
[RLlib] Assert correct policy class being used in Worker. ( #7769 )
2020-03-30 14:03:29 -07:00
Sven Mika
e4bd5db4d8
[RLlib] Minimal ParamNoise PR. ( #7772 )
2020-03-28 16:16:30 -07:00
Sven Mika
1138f2ebed
[RLlib] Issue 7046 cannot restore keras model from h5 file. ( #7482 )
2020-03-23 12:19:30 -07:00
Robert Nishihara
ee8c9ff732
Remove six and cloudpickle from setup.py. ( #7694 )
2020-03-23 11:42:05 -07:00
Eric Liang
9392cdbf74
[rllib] Add high-performance external application connector ( #7641 )
2020-03-20 12:43:57 -07:00
Eric Liang
797e6cfc2a
[rllib][tune] fix some nans ( #7611 )
2020-03-16 11:19:58 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
c3a8ba399f
[rllib] Enable distributed exec api for A2C, A3C, PG by default ( #7580 )
2020-03-13 18:48:41 -07:00
Sven Mika
f165766813
[RLlib] Bug: If trainer config horizon
is provided, should try to increase env steps to that value. ( #7531 )
2020-03-12 11:03:37 -07:00
Eric Liang
be48e1964b
[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety ( #7504 )
...
* fix sched
* lintc
* lint
* fix
* add unit test
* fix
* format
* fix test
* fix test
2020-03-10 11:14:14 -07:00
Eric Liang
a644060daa
[rllib] First pass at pipeline implementation of DQN ( #7433 )
...
* wip iters
* add test
* speed up
* update docs
* document it
* support serial sampling
* add test
* spacing
* annotate it
* update
* rename to pipeline
* comment
* iter2 wip
* update
* update
* context test
* update
* fix
* fix
* a3c pipeline
* doc
* update
* move timer
* comment
* add piepline test
* fix
* clean up
* document
* iter s
* wip dqn
* wip
* wip
* metrics
* metrics rename
* metrics ctx
* wip
* constants
* add todo
* suppport .union
* wip
* support union
* remove prints
* add todo
* remove auto timer
* fix up
* fix pipeline test
* typing
* fix breakage
* remove bad assert
* wip
* fix multiagent example
* fixapply
* update a3c
* remove a2c pl
* 0 workers
* wip
* wip
* share metrics
* wip
* wip
* doc
* fix weight sync and global var updates
* mode
* fix
* fix
* doc
* fix
2020-03-07 14:47:58 -08:00
Eric Liang
fddeb6809c
[RLlib] Issue 7401: In eval mode (if evaluation_episodes > 0), agent hangs if Env does not terminate. ( #7448 )
...
* Fix.
* Rollback.
* Fix issue 7421.
* Fix.
2020-03-04 12:58:34 -08:00
Sven Mika
357232d124
[Core/RLlib] Move log_once
from rllib to ray.util. ( #7273 )
...
* Move log_once from rllib to tune.
* Move log_once from rllib to tune.
* LINT.
* Move to ray.util.debug.
2020-02-27 10:40:44 -08:00
Eric Liang
46af992efd
[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) ( #7213 )
2020-02-19 16:07:37 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). ( #7155 )
2020-02-19 12:18:45 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental ( #7100 )
2020-02-18 13:43:19 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). ( #7178 )
...
* commit
* comment
2020-02-15 14:50:44 -08:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation ( #7088 )
2020-02-10 15:28:27 -08:00
Sven Mika
6e1c3ea824
[RLlib] Exploration API (+EpsilonGreedy sub-class). ( #6974 )
2020-02-10 15:22:07 -08:00
Sven Mika
5ac5ac9560
[RLlib] Fix broken example: tf-eager with custom-RNN ( #6732 ). ( #7021 )
...
* WIP.
* Fix float32 conversion in OneHot preprocessor (would cause float64 in eager, then NN-matmul-failure).
Add proper seq-len + state-in construction in eager_tf_policy.py::_compute_gradients().
* LINT.
* eager_tf_policy.py: Only set samples["seq_lens"] if RNN. Otherwise, eager-tracing will throw flattened-dict key-mismatch error.
* Move issue code to examples folder.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-02-06 09:44:08 -08:00
Eric Liang
fbc545c03b
[rllib] Support parallel, parameterized evaluation ( #6981 )
...
* eval api
* update
* sync eval filters
* sync fix
* docs
* update
* docs
* update
* link
* nit
* doc updates
* format
2020-02-01 22:12:12 -08:00
roireshef
3c60caa448
[rllib] implemented compute_advantages without gae ( #6941 )
2020-01-31 22:25:45 -08:00
roireshef
dc7a555260
[rllib] Feature/histograms in tensorboard ( #6942 )
...
* Added histogram functionality to custom metrics infrastructure (another tab in tensorboard)
* updated example to include histogram metric
* added histograms to TBXLogger
* add episode rewards
* lint
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-01-30 22:02:53 -08:00
Eric Liang
2fb53396ad
[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) ( #6918 )
2020-01-25 22:36:43 -08:00
Sven Mika
c957ed58ed
[RLlib] Implement PPO torch version. ( #6826 )
2020-01-20 23:06:50 -08:00
Sven
60d4d5e1aa
Remove future imports ( #6724 )
...
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). ( #6650 )
...
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Sven
8b16847c02
Get utils ready for better Agent torch support. ( #6561 )
2019-12-30 12:27:32 -08:00
Eric Liang
2530eb90dc
Move tf.test.is_gpu_available() to after session init ( #6515 )
...
* move to after session init
* script fixes
2019-12-17 14:55:39 -08:00
Eric Liang
4c6739476b
[rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() ( #6365 )
2019-12-05 10:13:54 -08:00
Eric Liang
ddc8855f41
Fix wrap ( #6293 )
2019-11-26 17:47:47 -08:00
gehring
8903bcd0c3
[rllib] Tracing for eager tensorflow policies with tf.function
( #5705 )
...
* Added tracing of eager policies with `tf.function`
* lint
* add config option
* add docs
* wip
* tracing now works with a3c
* typo
* none
* file doc
* returns
* syntax error
* syntax error
2019-09-17 01:44:20 -07:00