Sven Mika
f41a9b9813
[RLlib] Fix KL method of MultiCategorial tf distribution (issue #7009 ). ( #7119 )
...
* Fix KL method of MultiCategorial tf distribution.
* Fix KL method of MultiCategorial tf distribution.
* Merge AsyncReplayOptimizer fixes into this branch.
2020-02-12 12:46:15 -08:00
Sven Mika
2a0e4d94aa
[RLlib] Fix AsyncReplayOptimizer bug where it swallows all good worker tasks … ( #7111 )
2020-02-11 12:51:44 -08:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation ( #7088 )
2020-02-10 15:28:27 -08:00
Sven Mika
6e1c3ea824
[RLlib] Exploration API (+EpsilonGreedy sub-class). ( #6974 )
2020-02-10 15:22:07 -08:00
Sven Mika
5ac5ac9560
[RLlib] Fix broken example: tf-eager with custom-RNN ( #6732 ). ( #7021 )
...
* WIP.
* Fix float32 conversion in OneHot preprocessor (would cause float64 in eager, then NN-matmul-failure).
Add proper seq-len + state-in construction in eager_tf_policy.py::_compute_gradients().
* LINT.
* eager_tf_policy.py: Only set samples["seq_lens"] if RNN. Otherwise, eager-tracing will throw flattened-dict key-mismatch error.
* Move issue code to examples folder.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-02-06 09:44:08 -08:00
Eric Liang
fbc545c03b
[rllib] Support parallel, parameterized evaluation ( #6981 )
...
* eval api
* update
* sync eval filters
* sync fix
* docs
* update
* docs
* update
* link
* nit
* doc updates
* format
2020-02-01 22:12:12 -08:00
Sven Mika
b9ad79d66f
Add cartpole PPO torch to regression (besides tf). ( #7005 )
2020-02-01 17:41:38 -08:00
roireshef
3c60caa448
[rllib] implemented compute_advantages without gae ( #6941 )
2020-01-31 22:25:45 -08:00
Jaroslaw Rzepecki
67319bc887
[RLlib] Update MARWIL to use tf policy template ( #6975 )
...
* update MARWIL to use tf policy template
* formatting fixes
2020-01-31 12:57:52 -08:00
Sven Mika
211a9be9a5
[RLlib] Bug fix: PR anneals beta parameter beyond final given value. ( #6973 )
...
* Bug fix: PR anneals beta parameter beyond final given value.
* LINT.
* Trigger travis re-test.
2020-01-31 09:55:03 -08:00
Sven Mika
2ccf08ad10
[RLlib] Bug fix: DQN goes into negative epsilon values after reaching explora… ( #6971 )
...
* Bug fix: DQN goes into negative epsilon values after reaching exploration percentage.
* Add `epsilon_initial_eps` to SAC to pass test_nested_spaces.py.
* Add `exploration_initial_eps` to QMIX default config.
2020-01-31 09:54:12 -08:00
roireshef
dc7a555260
[rllib] Feature/histograms in tensorboard ( #6942 )
...
* Added histogram functionality to custom metrics infrastructure (another tab in tensorboard)
* updated example to include histogram metric
* added histograms to TBXLogger
* add episode rewards
* lint
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-01-30 22:02:53 -08:00
Sven Mika
136ada5fb9
[RLlib] Experiment with py_func as a means to further unify tf and torch (Schedule classes). ( #6951 )
2020-01-30 11:27:57 -08:00
Sven Mika
4c97348cb6
[RLlib] Schedule-classes multi-framework support. ( #6926 )
2020-01-28 11:07:55 -08:00
Eric Liang
e659699ca9
[tune] Fix directory naming regression ( #6839 )
2020-01-27 15:53:40 -08:00
Eric Liang
2fb53396ad
[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) ( #6918 )
2020-01-25 22:36:43 -08:00
Sven Mika
446cbdf2e0
[RLlib] Fix issue (bug): LSTM + non-shared vf + PPO + tuple actions ( #6890 )
...
* Add `RandomEnv` example to examples folder.
Convert warning into Error message when using an LSTM in a non-shared-vf network (after the warning, the program would crash).
* LINT.
* Fix issue #6884 . LSTM + non-shared vf NN + PPO crashes when using a Tuple action space.
* LINT
* Change warning message for Model: shared_vf=False, LSTM=True cases.
* Bug fix.
* Add examples/random_env.py test to Jenkins.
2020-01-24 10:29:35 -08:00
AnanthHari
aa2a0cb6da
Fixes empty state
argument in compute_single_action method ( #6894 )
...
* Fixes empty `state` parameter in compute_single_action method
* Fixed style
2020-01-23 00:42:52 -08:00
Sven Mika
ae9a3a2237
[RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. ( #6865 )
2020-01-22 17:02:58 -08:00
Sven Mika
c957ed58ed
[RLlib] Implement PPO torch version. ( #6826 )
2020-01-20 23:06:50 -08:00
Eric Liang
a229bdf272
[rllib] Deprecate custom preprocessors ( #6833 )
...
* deprecation warnings
* add log warn
* fix test
2020-01-18 23:30:09 -08:00
Sven Mika
7659cae3ba
[RLlib] Add PG torch regression test ( #6828 )
...
* Add PG torch regression test to tuned_examples/regression_tests dir.
* Rename cartpole-pg.yaml into cartpole-pg-tf.yaml
* cartpole-pg-tf.yaml: Change cartpole-pg name of tuned_example to cartpole-pg-tf.
2020-01-18 15:57:12 -08:00
Justin Terry
97bf79917c
[RLlib] Update MADDPG example repo to maintained fork ( #6831 )
2020-01-18 13:08:27 -08:00
Sven Mika
303547f119
[RLlib] Policy-classes cleanup and torch/tf unification. ( #6770 )
2020-01-17 22:26:28 -08:00
Sven Mika
e6227082bd
[RLlib] Add torch
flag to train.py ( #6807 )
2020-01-17 18:48:44 -08:00
Sven Mika
2bcf72e306
DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz. ( #6772 )
...
- This fixes a test case in test_evaluators.py.
2020-01-13 21:48:16 -08:00
Sven
60d4d5e1aa
Remove future imports ( #6724 )
...
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Ujval Misra
20ba7ef647
[tune] Move util to utils package ( #6682 )
...
* Move util.py to utils
* Fix import
2020-01-06 18:11:02 -08:00
Robert Nishihara
39a3459886
Remove (object) from class declarations. ( #6658 )
2020-01-02 17:42:13 -08:00
Sven
f1b56fa5ee
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). ( #6650 )
...
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Robert Nishihara
480206eef8
Remove some Python 2 compatibility code. ( #6624 )
2019-12-31 17:14:58 -08:00
Michael Luo
1cb335487e
SAC for Mujoco Environments ( #6642 )
2019-12-31 00:16:54 -08:00
Sven
8b16847c02
Get utils ready for better Agent torch support. ( #6561 )
2019-12-30 12:27:32 -08:00
Eric Liang
7c1e0e5715
Implement wait_local for wait ( #6524 )
2019-12-28 17:40:49 -08:00
Eric Liang
022954ac09
[rllib] Tuple action dist tensors not reduced properly in eager mode ( #6615 )
2019-12-28 09:51:09 -08:00
Eric Liang
3af84ada47
Revert "[rllib] remove exists call ( #6168 )" ( #6616 )
...
This reverts commit a68cda0a33
.
2019-12-26 22:44:26 -08:00
Zhongxia Yan
98689bd263
Changed foreach_policy to foreach_trainable_policy ( #6564 )
...
Changed foreach_policy to foreach_trainable_policy in DQN when disabling exploration. This makes it consistent with the rest of the file
2019-12-26 19:50:48 -08:00
gehring
b40869d0e4
Wrapper for the dm_env interface ( #6468 )
2019-12-26 13:22:17 -08:00
Michael Luo
548df014ec
SAC Performance Fixes ( #6295 )
...
* SAC Performance Fixes
* Small Changes
* Update sac_model.py
* fix normalize wrapper
* Update test_eager_support.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2019-12-20 10:51:25 -08:00
Eyal Sela
7b955881f3
Initializing default saver inside the function ( #6540 )
2019-12-19 12:29:45 -08:00
Eric Liang
2530eb90dc
Move tf.test.is_gpu_available() to after session init ( #6515 )
...
* move to after session init
* script fixes
2019-12-17 14:55:39 -08:00
Eugene Vinitsky
3cb499632e
(Bug Fix): Remove the extra 0.5 in the Diagonal Gaussian entropy ( #6475 )
2019-12-13 14:42:30 -08:00
Eric Liang
be5dd8eb5e
Enable direct calls by default ( #6367 )
...
* wip
* add
* timeout fix
* const ref
* comments
* fix
* fix
* Move actor state into actor handle
* comments 2
* enable by default
* temp reorder
* some fixes
* add debug code
* tmp
* fix
* wip
* remove dbg
* fix compile
* fix
* fix check
* remove non direct tests
* Increment ref count before resolving value
* rename
* fix another bug
* tmp
* tmp
* Fix object pinning
* build change
* lint
* ActorManager
* tmp
* ActorManager
* fix test component failures
* Remove old code
* Remove unused
* fix
* fix
* fix resources
* fix advanced
* eric's diff
* blacklist
* blacklist
* cleanup
* annotate
* disable tests for now
* remove
* fix
* fix
* clean up verbosity
* fix test
* fix concurrency test
* Update .travis.yml
* Update .travis.yml
* Update .travis.yml
* split up analysis suite
* split up trial runner suite
* fix detached direct actors
* fix
* split up advanced tesT
* lint
* fix core worker test hang
* fix bad check fail which breaks test_cluster.py in tune
* fix some minor diffs in test_cluster
* less workers
* make less stressful
* split up test
* retry flaky tests
* remove old test flags
* fixes
* lint
* Update worker_pool.cc
* fix race
* fix
* fix bugs in node failure handling
* fix race condition
* fix bugs in node failure handling
* fix race condition
* nits
* fix test
* disable heartbeatS
* disable heartbeatS
* fix
* fix
* use worker id
* fix max fail
* debug exit
* fix merge, and apply [PATCH] fix concurrency test
* [patch] fix core worker test hang
* remove NotifyActorCreation, and return worker on completion of actor creation task
* remove actor diied callback
* Update core_worker.cc
* lint
* use task manager
* fix merge
* fix deadlock
* wip
* merge conflits
* fix
* better sysexit handling
* better sysexit handling
* better sysexit handling
* check id
* better debug
* task failed msg
* task failed msg
* retry failed tasks with delay
* retry failed tasks with delay
* clip deps
* fix
* fix core worker tests
* fix task manager test
* fix all tests
* cleanup
* set to 0 for direct tests
* dont check worker id for ownership rpc
* dont check worker id for ownership rpc
* debug messages
* add comment
* remove debug statements
* nit
* check worker id
* fix test
* owner
* fix tests
2019-12-13 13:58:04 -08:00
Zack Polizzi
9e9c524823
Update pong-apex tuned example ( #6462 )
2019-12-12 10:57:55 -08:00
Victor Le
4e24c805ee
AlphaZero and Ranked reward implementation ( #6385 )
2019-12-07 12:08:40 -08:00
Eric Liang
4c6739476b
[rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() ( #6365 )
2019-12-05 10:13:54 -08:00
Stephanie Wang
da41180dc0
[direct task] Retry tasks on failure and turn on RAY_FORCE_DIRECT for test_multinode_failures.py ( #6306 )
...
* multinode failures direct
* Add number of retries allowed for tasks
* Retry tasks
* Add failing test for object reconstruction
* Handle return status and debug
* update
* Retry task unit test
* update
* update
* todo
* Fix max_retries decorator, fix test
* Fix test that flaked
* lint
* comments
2019-12-02 10:20:57 -08:00
Eric Liang
77b5098e7d
[rllib] Warn about dict action spaces
2019-11-27 12:57:38 -08:00
Eric Liang
ddc8855f41
Fix wrap ( #6293 )
2019-11-26 17:47:47 -08:00
Ameer Haj Ali
71316fa8d0
wrap models with DistributionalQModel when running DQN ( #6258 )
...
* wrap models with DistributionalQModel when running DQN
* wrap only for tensorflow models
* Update custom_keras_model.py
2019-11-25 00:11:24 -08:00