hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 04:46:38 -04:00

Author	SHA1	Message	Date
gjoliver	89fbfc00f8	[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623 )	2021-10-25 09:42:39 +02:00
gjoliver	44a4e42172	[rllib] Add entropy_coeff_schedule support for APPO. (#19544 ) * Add entropy_coeff_schedule support for APPO. * lint	2021-10-20 14:18:01 -07:00
Sven Mika	bd2d2079d2	[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf\|torch]) (#19269 )	2021-10-10 12:19:47 +02:00
Sven Mika	d439fd7f17	[RLlib] TF2/eager memory leak fixes. (#19198 )	2021-10-09 00:11:53 +02:00
Sven Mika	b4300dd532	[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937 )	2021-10-04 13:29:00 +02:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	698b4eeed3	[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669 )	2021-09-21 22:00:14 +02:00
Sven Mika	45f60e51a9	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
Sven Mika	599e589481	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
Sven Mika	a428f10ebe	[RLlib] Add multi-GPU learning tests to nightly. (#17778 )	2021-08-18 17:21:01 +02:00
Sven Mika	f3bbe4ea44	[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640 )	2021-08-16 22:01:01 +02:00
Sven Mika	924f11cd45	[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371 )	2021-08-03 11:35:49 -04:00
Sven Mika	649580d735	[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046 )	2021-07-15 05:51:24 -04:00
Amog Kamsetty	38b5b6d24c	Revert "[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565 )" (#17036 ) This reverts commit `e4123fff27`.	2021-07-13 09:57:15 -07:00
Sven Mika	e4123fff27	[RLlib] Simplify multiagent config (automatically infer class/spaces/config). (#16565 )	2021-07-13 06:38:14 -04:00
Sven Mika	53206dd440	[RLlib] CQL BC loss fixes; PPO/PG/A2\|3C action normalization fixes (#16531 )	2021-06-30 12:32:11 +02:00
Sven Mika	e80095591c	[RLlib] Entropy coeff schedule bug fix and git bisect script. (#15937 )	2021-05-20 18:15:10 +02:00
Sven Mika	2303851c3c	[RLlib] Torch multi-GPU + LSTM/RNN bug fix. (#15492 )	2021-05-18 11:51:05 +02:00
Sven Mika	461d73ddf1	[RLlib] `simple_optimizer` should not be used by default for tf+MA. (#15365 )	2021-05-10 16:10:44 +02:00
Sven Mika	e973b726c2	[RLlib] Support native tf.keras.Models (part 2) - Default keras models for Vision/RNN/Attention. (#15273 )	2021-04-30 19:26:30 +02:00
Sven Mika	bb8a286cbc	[RLlib] Support native tf.keras.Model (milestone toward obsoleting ModelV2 class). (#14684 )	2021-04-27 10:44:54 +02:00
Sven Mika	cecfc3b43b	[RLlib] Multi-GPU support for Torch algorithms. (#14709 )	2021-04-16 09:16:24 +02:00
Sven Mika	4f66309e19	[RLlib] Redo issue 14533 tf enable eager exec (#14984 )	2021-03-29 20:07:44 +02:00
SangBin Cho	fa5f961d5e	Revert "[RLlib] Issue 14533: `tf.enable_eager_execution()` must be called at beginning. (#14737 )" (#14918 ) This reverts commit `3e389d5812`.	2021-03-25 00:42:01 -07:00
Sven Mika	3e389d5812	[RLlib] Issue 14533: `tf.enable_eager_execution()` must be called at beginning. (#14737 )	2021-03-24 12:54:27 +01:00
Sven Mika	732197e23a	[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )	2021-03-08 15:41:27 +01:00
Sven Mika	775e685531	[RLlib] Issue #13824 : `compress_observations=True` crashes for all algos not using a replay buffer. (#14034 )	2021-02-18 21:36:32 +01:00
Sven Mika	2e3655e8a9	[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238 )	2021-01-19 14:22:36 +01:00
Sven Mika	93c0a5549b	[RLlib] Deprecate `vf_share_layers` in top-level PPO/MAML/MB-MPO configs. (#13397 )	2021-01-19 09:51:35 +01:00
Sven Mika	c524f86785	[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064 )	2020-12-27 09:46:03 -05:00
Sven Mika	e40b14d255	[RLlib] Batch-size for truncate_episode batch_mode should be confgurable in agent-steps (rather than env-steps), if needed. (#12420 )	2020-12-08 16:41:45 -08:00
Sven Mika	0df55a139c	[RLlib] Attention Net prep PR #1 : Smaller cleanups. (#12447 ) * WIP. * Fix. * Fix. * Fix.	2020-11-27 16:25:47 -08:00
Sven Mika	6475297bd3	[RLlib] Torch LR schedule not working. Fix and added test case. (#12396 )	2020-11-26 13:14:11 +01:00
Sven Mika	592c161032	[RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. (#12397 ) * WIP. * Fix and LINT.	2020-11-25 11:27:46 -08:00
Sven Mika	62c7ab5182	[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747 )	2020-11-12 16:27:34 +01:00
Sven Mika	d9f1874e34	[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609 )	2020-10-27 10:00:24 +01:00
Sven Mika	c17169dc11	[RLlib] Fix all example scripts to run on GPUs. (#11105 )	2020-10-02 23:07:44 +02:00
Sven Mika	ef18893fb5	[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420 )	2020-09-02 14:03:01 +02:00
Chua Cheow Huan	ea51e94729	[rllib] Learning rate schedule for DDPPO. (#10006 ) * Get shared metrics, increment counter & set global vars for remote workers. * Add unit test to test lr_schedule for DDPPO. * Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.	2020-08-15 00:51:45 -07:00
Barak Michener	8e76796fd0	ci: Redo `format.sh --all` script & backfill lint fixes (#9956 )	2020-08-07 16:49:49 -07:00
Sven Mika	fcdf410ae1	[RLlib] Tf2.x native. (#8752 )	2020-07-11 22:06:35 +02:00
Sven Mika	43043ee4d5	[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136 ) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT.	2020-06-30 10:13:20 +02:00
Sven Mika	5c6d5d4ab1	This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970 )	2020-06-27 20:50:01 +02:00
Sven Mika	7008902cff	[RLlib] Minor `rllib.utils` cleanup. (#8932 )	2020-06-16 08:52:20 +02:00
Sven Mika	4ed796a7d6	[RLlib] Add testing `Policy.compute_single_action()` for all agents. (#8903 )	2020-06-13 17:51:50 +02:00
Sven Mika	2746fc0476	[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520 )	2020-05-27 16:19:13 +02:00
Eric Liang	7ce138a6dc	[rllib] Support free_log_std in ModelV2 (#8380 ) * update * factor * update * fix test failures * fix torch net	2020-05-12 10:14:05 -07:00
Eric Liang	9d012626e5	[rllib] Distributed exec workflow for impala (#8321 )	2020-05-11 20:24:43 -07:00
Sven Mika	754290daad	[RLlib] Add light-weight `Trainer.compute_action()` tests for all Algos. (#8356 )	2020-05-08 16:31:31 +02:00

1 2

63 commits