hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 06:06:39 -04:00

Author	SHA1	Message	Date
Sven Mika	22ccc43670	[RLlib] DQN torch version. (#7597 ) * Fix. * Rollback. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * Fix. * Fix. * Fix. * Fix. * WIP. * WIP. * Fix. * Test case fixes. * Test case fixes and LINT. * Test case fixes and LINT. * Rollback. * WIP. * WIP. * Test case fixes. * Fix. * Fix. * Fix. * Add regression test for DQN w/ param noise. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Comment * Regression test case. * WIP. * WIP. * LINT. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * WIP. * LINT. * Fixes and LINT. * LINT and fixes. * LINT. * Move action_dist back into torch extra_action_out_fn and LINT. * Working SimpleQ learning cartpole on both torch AND tf. * Working Rainbow learning cartpole on tf. * Working Rainbow learning cartpole on tf. * WIP. * LINT. * LINT. * Update docs and add torch to APEX test. * LINT. * Fix. * LINT. * Fix. * Fix. * Fix and docstrings. * Fix broken RLlib tests in master. * Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier). * Fix error_outputs option in BAZEL for RLlib regression tests. * Fix. * Tune param-noise tests. * LINT. * Fix. * Fix. * test * test * test * Fix. * Fix. * WIP. * WIP. * WIP. * WIP. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-06 11:56:16 -07:00
Sven Mika	1d4823c0ec	[RLlib] Add testing framework_iterator. (#7852 ) * Add testing framework_iterator. * LINT. * WIP. * Fix and LINT. * LINT fix.	2020-04-03 12:24:25 -07:00
Sven Mika	bb6c675231	[RLlib] Bug fix: Copy `is_exploring` placeholder for multi-GPU tower generation. (#7846 )	2020-04-03 10:44:58 -07:00
Sven Mika	5537fe13b0	[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. (#7814 )	2020-04-03 10:44:25 -07:00
Sven Mika	e153e3179f	[RLlib] Exploration API: Policy changes needed for forward pass noisifications. (#7798 ) * Rollback. * WIP. * WIP. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-01 00:43:21 -07:00
Sven Mika	66df8b8c35	[RLlib] Working/learning example: PPO + torch + LSTM. (#7797 )	2020-03-31 22:00:28 -07:00
Sven Mika	e4bd5db4d8	[RLlib] Minimal ParamNoise PR. (#7772 )	2020-03-28 16:16:30 -07:00
Sven Mika	369a3417c4	[RLlib] Add tf-graph by default when doing `Policy.export_model()`. (#7759 ) * Rollback. * WIP. * WIP. * Fix. * LINT.	2020-03-26 22:07:10 -07:00
Sven Mika	1138f2ebed	[RLlib] Issue 7046 cannot restore keras model from h5 file. (#7482 )	2020-03-23 12:19:30 -07:00
Robert Nishihara	ee8c9ff732	Remove six and cloudpickle from setup.py. (#7694 )	2020-03-23 11:42:05 -07:00
Sven Mika	2fb219a658	[Ray RLlib] Fix tree import (#7662 ) * Rollback. * Fix import tree error by adding meaningful error and replacing by tf.nest wherever possible. * LINT. * LINT. * Fix. * Fix log-likelihood test case failing on travis.	2020-03-22 13:51:24 -07:00
Sven Mika	20ef4a8603	[RLlib] Cleanup/unify all test cases. (#7533 )	2020-03-11 20:39:47 -07:00
Eric Liang	be48e1964b	[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety (#7504 ) * fix sched * lintc * lint * fix * add unit test * fix * format * fix test * fix test	2020-03-10 11:14:14 -07:00
Sven Mika	f08687f550	[RLlib] `rllib train` crashes when using torch PPO/PG/A2C. (#7508 ) * Fix. * Rollback. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST.	2020-03-08 13:03:18 -07:00
Sven Mika	876a1ba5bd	[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. (#7445 )	2020-03-06 12:45:30 -08:00
Sven Mika	510c850651	[RLlib] SAC add discrete action support. (#7320 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-03-06 10:37:12 -08:00
Eric Liang	596b39e36a	[rllib] Make timestep a required arg for exploration classes (#7380 )	2020-03-04 13:00:37 -08:00
Sven Mika	83e06cd30a	[RLlib] DDPG refactor and Exploration API action noise classes. (#7314 ) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP.	2020-03-01 11:53:35 -08:00
Sven Mika	357232d124	[Core/RLlib] Move `log_once` from rllib to ray.util. (#7273 ) * Move log_once from rllib to tune. * Move log_once from rllib to tune. * LINT. * Move to ray.util.debug.	2020-02-27 10:40:44 -08:00
Eric Liang	58073f7260	[rllib] Fix multiagent example crash due to undefined abstract method (#7329 ) * fix multiagent example * 0 workers	2020-02-26 22:54:40 -08:00
Sven Mika	aec03656d5	[RLlib] TupleActions cannot be exported by Policy: Fixes issues 7231 and 5593. #7333	2020-02-26 15:22:54 -08:00
Sven Mika	0db2046b0a	[RLlib] Policy.compute_log_likelihoods() and SAC refactor. (issue #7107 ) (#7124 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * WIP. * Fix SAC. * Fix SAC. * Fix strange tf-error in ray core tests. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix test_io.py. * LINT. * Update SAC yaml files' config. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-02-22 14:19:49 -08:00
Sven Mika	e2edca45d4	[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238 ) * Take out stats to analyze memory leak in torch PPO. * WIP * WIP * WIP * WIP * WIP * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * LINT. * Fix determine_tests_to_run.py. * minor change to re-test after determine_tests_to_run.py. * LINT. * update comments. * WIP * WIP * WIP * FIX. * Fix sequence_mask being dependent on torch being installed. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case.	2020-02-22 11:02:31 -08:00
Sven Mika	d537e9f0d8	[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155 )	2020-02-19 12:18:45 -08:00
Eric Liang	42aea966ff	[rllib] Convert torch state arrays to tensors during compute actions (#7162 ) * convert to tensor * normalize fix	2020-02-17 10:26:58 -08:00
Eric Liang	026f6884b5	[rllib] Add Decentralized DDPPO trainer and documentation (#7088 )	2020-02-10 15:28:27 -08:00
Sven Mika	6e1c3ea824	[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974 )	2020-02-10 15:22:07 -08:00
Sven Mika	5ac5ac9560	[RLlib] Fix broken example: tf-eager with custom-RNN (#6732 ). (#7021 ) * WIP. * Fix float32 conversion in OneHot preprocessor (would cause float64 in eager, then NN-matmul-failure). Add proper seq-len + state-in construction in eager_tf_policy.py::_compute_gradients(). * LINT. * eager_tf_policy.py: Only set samples["seq_lens"] if RNN. Otherwise, eager-tracing will throw flattened-dict key-mismatch error. * Move issue code to examples folder. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-02-06 09:44:08 -08:00
Eric Liang	2fb53396ad	[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) (#6918 )	2020-01-25 22:36:43 -08:00
AnanthHari	aa2a0cb6da	Fixes empty `state` argument in compute_single_action method (#6894 ) * Fixes empty `state` parameter in compute_single_action method * Fixed style	2020-01-23 00:42:52 -08:00
Sven Mika	c957ed58ed	[RLlib] Implement PPO torch version. (#6826 )	2020-01-20 23:06:50 -08:00
Sven Mika	303547f119	[RLlib] Policy-classes cleanup and torch/tf unification. (#6770 )	2020-01-17 22:26:28 -08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Robert Nishihara	39a3459886	Remove (object) from class declarations. (#6658 )	2020-01-02 17:42:13 -08:00
waldroje	e4c0843f60	Allow EntropyCoeffSchedule to accept custom schedule (#6158 ) * modify tf_policy to enable EntropyCoeffSchedule to handle list, and avoid negative values under current implementation * Update custom_metrics_and_callbacks.py * Update tf_policy.py	2019-11-14 00:45:43 -08:00
Eric Liang	1f043daf69	[rllib] Fix and add test for LR annealing config	2019-11-07 12:17:27 -08:00
Eric Liang	16891e9379	[rllib] Don't use flat weights in non-eager mode (#6001 )	2019-10-31 15:16:02 -07:00
Eric Liang	a0dcb45dc3	[rllib] Fix APEX priorities returning zero all the time (#5980 ) * fix * move example tests to end * level err * guard against none * no trace test * ignore thumbs * np * fix multi node * fix	2019-10-26 13:23:42 -07:00
Eric Liang	f7bda0abad	[rllib] Fix rnn shape with multi-dimensional data (#5939 ) * fix shape * add test * Update rnn_sequencing.py	2019-10-22 11:07:26 -07:00
Matthew A. Wright	0110941de5	rllib: use pytorch's fn to see if gpu is available (#5890 )	2019-10-12 00:13:00 -07:00
Eric Liang	04e997fe0d	Fix TF2 / rllib test (#5846 )	2019-10-07 14:25:16 -07:00
Eric Liang	c6919d315d	[rllib] Remove TorchPolicy locks (#5764 ) * remove torch lock * remove lock	2019-09-24 17:52:16 -07:00
gehring	8903bcd0c3	[rllib] Tracing for eager tensorflow policies with `tf.function` (#5705 ) * Added tracing of eager policies with `tf.function` * lint * add config option * add docs * wip * tracing now works with a3c * typo * none * file doc * returns * syntax error * syntax error	2019-09-17 01:44:20 -07:00
Eric Liang	bc6a95deb0	[rllib] Eager execution for centralized critic example, fix simple optimizer for multiagent (#5683 )	2019-09-11 12:15:34 -07:00
Eric Liang	74abeab057	[rllib] Improve accessing model state docs (#5656 ) * [rllib] better model docs * fix * s	2019-09-08 23:01:26 -07:00
Eric Liang	03a1b75852	[rllib] Fix some eager execution regressions with 1.13 (#5537 ) * fix bugs with 1.13 * allow disable	2019-08-26 23:23:35 -07:00
gehring	b520f6141e	[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436 )	2019-08-23 14:21:11 +08:00
Eric Liang	79949fb8a0	[rllib] RLlib in 60 seconds documentation (#5430 )	2019-08-12 17:39:02 -07:00
Eric Liang	a1d2e17623	[rllib] Autoregressive action distributions (#5304 )	2019-08-10 14:05:12 -07:00
Eric Liang	592f313210	[rllib] Centralized critic / PPO example on TwoStepGame (#5392 )	2019-08-08 14:03:28 -07:00

... 3 4 5 6 7

302 commits