Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. ( #7852 )
...
* Add testing framework_iterator.
* LINT.
* WIP.
* Fix and LINT.
* LINT fix.
2020-04-03 12:24:25 -07:00
Sven Mika
bb6c675231
[RLlib] Bug fix: Copy is_exploring
placeholder for multi-GPU tower generation. ( #7846 )
2020-04-03 10:44:58 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. ( #7798 )
...
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
66df8b8c35
[RLlib] Working/learning example: PPO + torch + LSTM. ( #7797 )
2020-03-31 22:00:28 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. ( #7533 )
2020-03-11 20:39:47 -07:00
Sven Mika
d8eeb96413
Fix issue with torch PPO not handling action spaces of shape=(>1,). ( #7398 )
2020-03-02 10:53:19 -08:00
Sven Mika
e2edca45d4
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. ( #7238 )
...
* Take out stats to analyze memory leak in torch PPO.
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* Fix determine_tests_to_run.py.
* minor change to re-test after determine_tests_to_run.py.
* LINT.
* update comments.
* WIP
* WIP
* WIP
* FIX.
* Fix sequence_mask being dependent on torch being installed.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
2020-02-22 11:02:31 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). ( #7155 )
2020-02-19 12:18:45 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). ( #7178 )
...
* commit
* comment
2020-02-15 14:50:44 -08:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation ( #7088 )
2020-02-10 15:28:27 -08:00
Sven Mika
6e1c3ea824
[RLlib] Exploration API (+EpsilonGreedy sub-class). ( #6974 )
2020-02-10 15:22:07 -08:00
roireshef
3c60caa448
[rllib] implemented compute_advantages without gae ( #6941 )
2020-01-31 22:25:45 -08:00
Eric Liang
2fb53396ad
[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) ( #6918 )
2020-01-25 22:36:43 -08:00
Sven Mika
c957ed58ed
[RLlib] Implement PPO torch version. ( #6826 )
2020-01-20 23:06:50 -08:00
Sven Mika
303547f119
[RLlib] Policy-classes cleanup and torch/tf unification. ( #6770 )
2020-01-17 22:26:28 -08:00
Sven Mika
e6227082bd
[RLlib] Add torch
flag to train.py ( #6807 )
2020-01-17 18:48:44 -08:00
Sven
60d4d5e1aa
Remove future imports ( #6724 )
...
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Eric Liang
8fc2272f43
[rllib] Reorganize trainer config, add warnings about high VF loss magnitude for PPO ( #6181 )
2019-11-18 10:39:07 -08:00
Eric Liang
16891e9379
[rllib] Don't use flat weights in non-eager mode ( #6001 )
2019-10-31 15:16:02 -07:00
Ashwinee Panda
946ebfaa3c
[rllib] Validate that entropy coeff is not an integer ( #5687 )
...
* Validate that entropy coeff is not an integer
Passing an integer value for entropy coeff such as 0 raises an error somewhere inside the TF policy graph, so this checks to make sure the entropy coeff is a float.
* Cast to float instead
Also move this check after the negative value check
2019-09-11 14:35:42 -07:00
Eric Liang
bc6a95deb0
[rllib] Eager execution for centralized critic example, fix simple optimizer for multiagent ( #5683 )
2019-09-11 12:15:34 -07:00
gehring
b520f6141e
[rllib] Adds eager support with a generic TFEagerPolicy
class ( #5436 )
2019-08-23 14:21:11 +08:00
Eric Liang
a1d2e17623
[rllib] Autoregressive action distributions ( #5304 )
2019-08-10 14:05:12 -07:00
Matthew A. Wright
e3c9f7e83a
Custom action distributions ( #5164 )
...
* custom action dist wip
* Test case for custom action dist
* ActionDistribution.get_parameter_shape_for_action_space pattern
* Edit exception message to also suggest using a custom action distribution
* Clean up ModelCatalog.get_action_dist
* Pass model config to ActionDistribution constructors
* Update custom action distribution test case
* Name fix
* Autoformatter
* parameter shape static methods for torch distributions
* Fix docstring
* Generalize fake array for graph initialization
* Fix action dist constructors
* Correct parameter shape static methods for multicategorical and gaussian
* Make suggested changes to custom action dist's
* Correct instances of not passing model config to action dist
* Autoformatter
* fix tuple distribution constructor
* bugfix
2019-08-06 11:13:16 -07:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir ( #5324 )
2019-08-05 23:25:49 -07:00