Commit graph

102 commits

Author SHA1 Message Date
mehrdadn
3bd82d0bcd
Fix various issues/warnings that come up on Jenkins (#7147)
* Avoid warning about swap being unlimited

Currently we get the following message on Jenkins:
"Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."

Since we're not limiting swap anyway, we might as well avoid trying to.
https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details

* Fix escaping in re.search()

* Fix escaping in _noisy_layer()

* Raise a more descriptive error when dashboard data isn't found

* Don't error on dashboard files not being found when webui isn't required

* Change dashboard error to a warning instead
2020-02-17 16:08:55 -08:00
Eric Liang
42aea966ff
[rllib] Convert torch state arrays to tensors during compute actions (#7162)
* convert to tensor

* normalize fix
2020-02-17 10:26:58 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). (#7178)
* commit

* comment
2020-02-15 14:50:44 -08:00
Sven Mika
5518a738b3
[RLlib] Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). (#7125)
* Fix erroneous use of LinearSchedule (in DDPG's exploration annealing).
Erase schedules_obsoleted.py.

* Trigger re-test.

* Re-test.
2020-02-12 23:46:49 -08:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation (#7088) 2020-02-10 15:28:27 -08:00
Sven Mika
6e1c3ea824
[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974) 2020-02-10 15:22:07 -08:00
Eric Liang
fbc545c03b
[rllib] Support parallel, parameterized evaluation (#6981)
* eval api

* update

* sync eval filters

* sync fix

* docs

* update

* docs

* update

* link

* nit

* doc updates

* format
2020-02-01 22:12:12 -08:00
roireshef
3c60caa448
[rllib] implemented compute_advantages without gae (#6941) 2020-01-31 22:25:45 -08:00
Jaroslaw Rzepecki
67319bc887
[RLlib] Update MARWIL to use tf policy template (#6975)
* update MARWIL to use tf policy template

* formatting fixes
2020-01-31 12:57:52 -08:00
Sven Mika
2ccf08ad10
[RLlib] Bug fix: DQN goes into negative epsilon values after reaching explora… (#6971)
* Bug fix: DQN goes into negative epsilon values after reaching exploration percentage.

* Add `epsilon_initial_eps` to SAC to pass test_nested_spaces.py.

* Add `exploration_initial_eps` to QMIX default config.
2020-01-31 09:54:12 -08:00
Eric Liang
e659699ca9
[tune] Fix directory naming regression (#6839) 2020-01-27 15:53:40 -08:00
Eric Liang
2fb53396ad [rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) (#6918) 2020-01-25 22:36:43 -08:00
Sven Mika
ae9a3a2237 [RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. (#6865) 2020-01-22 17:02:58 -08:00
Sven Mika
c957ed58ed [RLlib] Implement PPO torch version. (#6826) 2020-01-20 23:06:50 -08:00
Sven Mika
303547f119 [RLlib] Policy-classes cleanup and torch/tf unification. (#6770) 2020-01-17 22:26:28 -08:00
Sven Mika
e6227082bd [RLlib] Add torch flag to train.py (#6807) 2020-01-17 18:48:44 -08:00
Sven Mika
2bcf72e306 DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz. (#6772)
- This fixes a test case in test_evaluators.py.
2020-01-13 21:48:16 -08:00
Sven
60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Robert Nishihara
39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650)
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).

* Fix LINT line-len errors.

* Fix LINT errors.

* Fix `tf_pg_policy` imports (formerly: `pg_policy`).

* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).

* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
  then built into the Bazel/Travis test suite.

* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.

* Fix remaining import errors for agents/pg/...

* Fix circular dependency in pg imports.

* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Michael Luo
1cb335487e SAC for Mujoco Environments (#6642) 2019-12-31 00:16:54 -08:00
Sven
8b16847c02 Get utils ready for better Agent torch support. (#6561) 2019-12-30 12:27:32 -08:00
Zhongxia Yan
98689bd263 Changed foreach_policy to foreach_trainable_policy (#6564)
Changed foreach_policy to foreach_trainable_policy in DQN when disabling exploration. This makes it consistent with the rest of the file
2019-12-26 19:50:48 -08:00
Michael Luo
548df014ec SAC Performance Fixes (#6295)
* SAC Performance Fixes

* Small Changes

* Update sac_model.py

* fix normalize wrapper

* Update test_eager_support.py

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2019-12-20 10:51:25 -08:00
Eric Liang
8fc2272f43
[rllib] Reorganize trainer config, add warnings about high VF loss magnitude for PPO (#6181) 2019-11-18 10:39:07 -08:00
Philipp Moritz
fc655acfee
Fix linting on master branch (#6174) 2019-11-16 10:02:58 -08:00
Eric Liang
243b1b7281
[rllib] Add microbatch optimizer with A2C example (#6161) 2019-11-14 12:14:00 -08:00
Eric Liang
e4565c9cc6
Reduce RLlib log verbosity (#6154) 2019-11-13 18:50:45 -08:00
Eric Liang
16891e9379
[rllib] Don't use flat weights in non-eager mode (#6001) 2019-10-31 15:16:02 -07:00
Eric Liang
a0dcb45dc3
[rllib] Fix APEX priorities returning zero all the time (#5980)
* fix

* move example tests to end

* level err

* guard against none

* no trace test

* ignore thumbs

* np

* fix multi node

* fix
2019-10-26 13:23:42 -07:00
Matthew A. Wright
0110941de5 rllib: use pytorch's fn to see if gpu is available (#5890) 2019-10-12 00:13:00 -07:00
Matthew A. Wright
4aa06918ae Qmix on gpu and with non-stacked-obs environment state support (#5751) 2019-10-08 13:18:07 -07:00
Eric Liang
c6919d315d
[rllib] Remove TorchPolicy locks (#5764)
* remove torch lock

* remove lock
2019-09-24 17:52:16 -07:00
Vince Jankovics
7e214fd95e [tune] TensorBoard HParams for TF2.0 (#5678) 2019-09-21 11:06:34 -07:00
Kilian Batzner
79b9c70ad6 Add local_tf_session_args to unknown subkeys whitelist (#5742)
* Add local_tf_session_args to unknown subkeys whitelist

* Remove trailing whitespace
2019-09-20 10:32:49 -07:00
Matthew A. Wright
3131e1742d [rllib] Qmix off by 1 in double Q calculation (#5731)
* Qmix fix.

-Current version of double Q learning is incorrect; it selects actions
at timestep t instead of t+1 when computing the t+1 Q value.

* Allow extra obs dict keys

* Move Q-value-computing replay code to own function

* Run the autoformatter

* use better terms in comments ("policy" network instead of "live" network)
2019-09-18 18:12:30 -07:00
gehring
8903bcd0c3 [rllib] Tracing for eager tensorflow policies with tf.function (#5705)
* Added tracing of eager policies with `tf.function`

* lint

* add config option

* add docs

* wip

* tracing now works with a3c

* typo

* none

* file doc

* returns

* syntax error

* syntax error
2019-09-17 01:44:20 -07:00
Ashwinee Panda
946ebfaa3c [rllib] Validate that entropy coeff is not an integer (#5687)
* Validate that entropy coeff is not an integer

Passing an integer value for entropy coeff such as 0 raises an error somewhere inside the TF policy graph, so this checks to make sure the entropy coeff is a float.

* Cast to float instead

Also move this check after the negative value check
2019-09-11 14:35:42 -07:00
Eric Liang
bc6a95deb0
[rllib] Eager execution for centralized critic example, fix simple optimizer for multiagent (#5683) 2019-09-11 12:15:34 -07:00
Eric Liang
cf90394a09
[rllib] Fix TF2 import of EagerVariableStore (#5625) 2019-09-07 12:10:03 -07:00
Eric Liang
19bbf1eb4d [rllib] Revert [rllib] Port DDPG to the build_tf_policy pattern (#5626) 2019-09-04 21:39:22 -07:00
Eric Liang
daf38c8723
[tune] Deprecate tune.function (#5601)
* remove tune function

* remove examples

* Update tune-usage.rst
2019-08-31 16:00:10 -07:00
Philipp Moritz
747daff2cb
Fix impala stress test (#5596) 2019-08-31 01:20:53 -07:00
Eric Liang
38231907f3
[rllib] Forgot to register param noise layer variables 2019-08-29 18:12:31 -07:00
Eric Liang
03a1b75852
[rllib] Fix some eager execution regressions with 1.13 (#5537)
* fix bugs with 1.13

* allow disable
2019-08-26 23:23:35 -07:00
Eric Liang
97ccd75952
[rllib] Enable object store memory limit by default (#5534) 2019-08-26 01:37:28 -07:00
gehring
b520f6141e [rllib] Adds eager support with a generic TFEagerPolicy class (#5436) 2019-08-23 14:21:11 +08:00
Eric Liang
e2e30ca507 Ray, Tune, and RLlib support for memory, object_store_memory options (#5226) 2019-08-21 23:01:10 -07:00
Eric Liang
a1d2e17623
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00
Eric Liang
592f313210
[rllib] Centralized critic / PPO example on TwoStepGame (#5392) 2019-08-08 14:03:28 -07:00