hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Eric Liang	aa4861c2a0	Checkpoint Adam momenta for DDPG (#7449 )	2020-03-04 10:03:41 -08:00
Sven Mika	7faf0d8f89	[RLlib] Make rollout always use `evaluation_config`. (#7396 )	2020-03-03 17:20:35 -08:00
Eric Liang	0f88444686	[rllib] Support multi-agent training in pipeline impls, add easy flag to enable (#7338 )	2020-03-02 15:16:37 -08:00
Sven Mika	d8eeb96413	Fix issue with torch PPO not handling action spaces of shape=(>1,). (#7398 )	2020-03-02 10:53:19 -08:00
Sven Mika	83e06cd30a	[RLlib] DDPG refactor and Exploration API action noise classes. (#7314 ) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP.	2020-03-01 11:53:35 -08:00
Eric Liang	3c6b94f3f5	[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C (#7299 )	2020-02-28 16:44:17 -08:00
Sven Mika	0c9e5db9cb	Fix SAC bug (twin Q not used for min'ing over both Q-nets in loss func). (#7354 )	2020-02-27 12:49:08 -08:00
Sven Mika	e1fc8368d4	[RLlib] SAC refactor with new SquashedGaussian distribution class. (#7272 )	2020-02-23 16:10:20 -08:00
Sven Mika	0db2046b0a	[RLlib] Policy.compute_log_likelihoods() and SAC refactor. (issue #7107 ) (#7124 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * WIP. * Fix SAC. * Fix SAC. * Fix strange tf-error in ray core tests. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix test_io.py. * LINT. * Update SAC yaml files' config. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-02-22 14:19:49 -08:00
Sven Mika	e2edca45d4	[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238 ) * Take out stats to analyze memory leak in torch PPO. * WIP * WIP * WIP * WIP * WIP * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * LINT. * Fix determine_tests_to_run.py. * minor change to re-test after determine_tests_to_run.py. * LINT. * update comments. * WIP * WIP * WIP * FIX. * Fix sequence_mask being dependent on torch being installed. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case.	2020-02-22 11:02:31 -08:00
Sven Mika	cbc808bc6b	[Tests] determine_tests_to_run.sh has a bug affecting RLlib testing to be skipped sometimes. (#7243 )	2020-02-20 19:02:17 -08:00
Sven Mika	6043ce710d	Fix old exploration configs. (#7240 )	2020-02-20 08:39:16 -08:00
Eric Liang	46af992efd	[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) (#7213 )	2020-02-19 16:07:37 -08:00
Sven Mika	d537e9f0d8	[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155 )	2020-02-19 12:18:45 -08:00
Eric Liang	399424c418	[rllib] Fix broken check in eval mode for IMPALA #7217	2020-02-19 11:54:30 -08:00
mehrdadn	3bd82d0bcd	Fix various issues/warnings that come up on Jenkins (#7147 ) * Avoid warning about swap being unlimited Currently we get the following message on Jenkins: "Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap." Since we're not limiting swap anyway, we might as well avoid trying to. https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details * Fix escaping in re.search() * Fix escaping in _noisy_layer() * Raise a more descriptive error when dashboard data isn't found * Don't error on dashboard files not being found when webui isn't required * Change dashboard error to a warning instead	2020-02-17 16:08:55 -08:00
Eric Liang	42aea966ff	[rllib] Convert torch state arrays to tensors during compute actions (#7162 ) * convert to tensor * normalize fix	2020-02-17 10:26:58 -08:00
Sven Mika	2e60f0d4d8	[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). (#7178 ) * commit * comment	2020-02-15 14:50:44 -08:00
Sven Mika	5518a738b3	[RLlib] Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). (#7125 ) * Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). Erase schedules_obsoleted.py. * Trigger re-test. * Re-test.	2020-02-12 23:46:49 -08:00
Eric Liang	026f6884b5	[rllib] Add Decentralized DDPPO trainer and documentation (#7088 )	2020-02-10 15:28:27 -08:00
Sven Mika	6e1c3ea824	[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974 )	2020-02-10 15:22:07 -08:00
Eric Liang	fbc545c03b	[rllib] Support parallel, parameterized evaluation (#6981 ) * eval api * update * sync eval filters * sync fix * docs * update * docs * update * link * nit * doc updates * format	2020-02-01 22:12:12 -08:00
roireshef	3c60caa448	[rllib] implemented compute_advantages without gae (#6941 )	2020-01-31 22:25:45 -08:00
Jaroslaw Rzepecki	67319bc887	[RLlib] Update MARWIL to use tf policy template (#6975 ) * update MARWIL to use tf policy template * formatting fixes	2020-01-31 12:57:52 -08:00
Sven Mika	2ccf08ad10	[RLlib] Bug fix: DQN goes into negative epsilon values after reaching explora… (#6971 ) * Bug fix: DQN goes into negative epsilon values after reaching exploration percentage. * Add `epsilon_initial_eps` to SAC to pass test_nested_spaces.py. * Add `exploration_initial_eps` to QMIX default config.	2020-01-31 09:54:12 -08:00
Eric Liang	e659699ca9	[tune] Fix directory naming regression (#6839 )	2020-01-27 15:53:40 -08:00
Eric Liang	2fb53396ad	[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) (#6918 )	2020-01-25 22:36:43 -08:00
Sven Mika	ae9a3a2237	[RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. (#6865 )	2020-01-22 17:02:58 -08:00
Sven Mika	c957ed58ed	[RLlib] Implement PPO torch version. (#6826 )	2020-01-20 23:06:50 -08:00
Sven Mika	303547f119	[RLlib] Policy-classes cleanup and torch/tf unification. (#6770 )	2020-01-17 22:26:28 -08:00
Sven Mika	e6227082bd	[RLlib] Add `torch` flag to train.py (#6807 )	2020-01-17 18:48:44 -08:00
Sven Mika	2bcf72e306	DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz. (#6772 ) - This fixes a test case in test_evaluators.py.	2020-01-13 21:48:16 -08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Robert Nishihara	39a3459886	Remove (object) from class declarations. (#6658 )	2020-01-02 17:42:13 -08:00
Sven	f1b56fa5ee	PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650 ) * Unifying the code for PGTrainer/Policy wrt tf vs torch. Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch). * Fix LINT line-len errors. * Fix LINT errors. * Fix `tf_pg_policy` imports (formerly: `pg_policy`). * Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer. Retire `PGAgent` class (use PGTrainer instead). * - Move PG test into agents/pg/tests directory. - All test cases will be located near the classes that are tested and then built into the Bazel/Travis test suite. * Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c the function is not a tf-specific one. * Fix remaining import errors for agents/pg/... * Fix circular dependency in pg imports. * Add pg tests to Jenkins test suite.	2020-01-02 16:08:03 -08:00
Michael Luo	1cb335487e	SAC for Mujoco Environments (#6642 )	2019-12-31 00:16:54 -08:00
Sven	8b16847c02	Get utils ready for better Agent torch support. (#6561 )	2019-12-30 12:27:32 -08:00
Zhongxia Yan	98689bd263	Changed foreach_policy to foreach_trainable_policy (#6564 ) Changed foreach_policy to foreach_trainable_policy in DQN when disabling exploration. This makes it consistent with the rest of the file	2019-12-26 19:50:48 -08:00
Michael Luo	548df014ec	SAC Performance Fixes (#6295 ) * SAC Performance Fixes * Small Changes * Update sac_model.py * fix normalize wrapper * Update test_eager_support.py Co-authored-by: Eric Liang <ekhliang@gmail.com>	2019-12-20 10:51:25 -08:00
Eric Liang	8fc2272f43	[rllib] Reorganize trainer config, add warnings about high VF loss magnitude for PPO (#6181 )	2019-11-18 10:39:07 -08:00
Philipp Moritz	fc655acfee	Fix linting on master branch (#6174 )	2019-11-16 10:02:58 -08:00
Eric Liang	243b1b7281	[rllib] Add microbatch optimizer with A2C example (#6161 )	2019-11-14 12:14:00 -08:00
Eric Liang	e4565c9cc6	Reduce RLlib log verbosity (#6154 )	2019-11-13 18:50:45 -08:00
Eric Liang	16891e9379	[rllib] Don't use flat weights in non-eager mode (#6001 )	2019-10-31 15:16:02 -07:00
Eric Liang	a0dcb45dc3	[rllib] Fix APEX priorities returning zero all the time (#5980 ) * fix * move example tests to end * level err * guard against none * no trace test * ignore thumbs * np * fix multi node * fix	2019-10-26 13:23:42 -07:00
Matthew A. Wright	0110941de5	rllib: use pytorch's fn to see if gpu is available (#5890 )	2019-10-12 00:13:00 -07:00
Matthew A. Wright	4aa06918ae	Qmix on gpu and with non-stacked-obs environment state support (#5751 )	2019-10-08 13:18:07 -07:00
Eric Liang	c6919d315d	[rllib] Remove TorchPolicy locks (#5764 ) * remove torch lock * remove lock	2019-09-24 17:52:16 -07:00
Vince Jankovics	7e214fd95e	[tune] TensorBoard HParams for TF2.0 (#5678 )	2019-09-21 11:06:34 -07:00
Kilian Batzner	79b9c70ad6	Add local_tf_session_args to unknown subkeys whitelist (#5742 ) * Add local_tf_session_args to unknown subkeys whitelist * Remove trailing whitespace	2019-09-20 10:32:49 -07:00

... 2 3 4 5 6

267 commits