hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	4d945fe651	[RLlib] Issue 19878: Re-instate bare_metal_policy example script (#19881 )	2021-10-30 12:50:39 -07:00
Sven Mika	9c73871da0	[RLlib; Docs overhaul] Docstring cleanup: Evaluation (#19783 )	2021-10-29 12:03:56 +02:00
Rohan138	b9c9cc5946	[RLlib] Updated PettingZoo+RLlib tutorial; Removed pettingzoo example script (#19069 ) * Updated PettingZoo+RLlib tutorial Updated the tutorial and added link to the blog post by the PettingZoo team. * Ran linting * Converted link to tinyurl for linting * fixed line lengths * Decrease num_workers to 1 * Added comments * Decreased num_workers * Decreased timesteps * Increased num_workers * Update links and remove pettingzoo_env.py * remove pettingzoo.py script from tests Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-29 10:57:10 +02:00
Sven Mika	902e854af2	[RLlib; Docs overhaul] Docstring cleanup: Environments. (#19784 ) * wip. * Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes. * remove bare_metal_policy_with_custom_view_reqs from tests	2021-10-29 10:46:52 +02:00
gjoliver	d81885c1f1	[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809 ) * Fix DDPG, since it is based on GenericOffPolicyTrainer. * Fix QMix, SAC, and MADDPA too. * Undo QMix change. * Fix DQN input batch type. Always use SampleBatch. * apex ddpg should not use replay_buffer_config yet. * Make eager tf policy to use SampleBatch. * lint * LINT. * Re-enable RLlib broken tests to make sure things work ok now. * fixes. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-28 18:06:47 +02:00
Simon Mo	5e927b01ad	Revert "[CI] Remove config that disables Bazel test result cache" (#19818 ) * Revert "[CI] Remove config that disables Bazel test result cache (#18701)" This reverts commit `098ff36faa`. * Remove all RLlib tests from BUILD that currently fail. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-28 15:54:53 +02:00
gjoliver	39b0faa3ec	[RLlib]: bug fix, should be input_dict['is_training'] (#19805 )	2021-10-27 23:30:43 +02:00
Sven Mika	4a82d3ea6c	Revert "[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )" (#19806 ) This reverts commit `80eeb13175`.	2021-10-27 23:30:07 +02:00
Sven Mika	80eeb13175	[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )	2021-10-27 19:15:35 +02:00
Sven Mika	f2cb2ed203	[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. (#19759 )	2021-10-27 19:14:39 +02:00
gjoliver	99a0088233	[RLlib] Unify the way we create local replay buffer for all agents (#19627 ) * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field.	2021-10-26 20:56:02 +02:00
Avnish Narayan	ad87ddf93e	[rllib] Add deterministic test to gpu (#19306 ) Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-26 10:11:39 -07:00
Sven Mika	b213565783	[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693 )	2021-10-25 15:00:00 +02:00
gjoliver	89fbfc00f8	[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623 )	2021-10-25 09:42:39 +02:00
roireshef	9b0352f363	[RLlib] Added LearningRateSchedule and EntropyCoeffSchedule to TF and Torch versions of A3C and PPO (#19276 )	2021-10-25 09:39:35 +02:00
gjoliver	c3c42278e4	[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652 ) * [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings. * wip	2021-10-25 09:38:56 +02:00
xwjiang2010	a632cb439f	[Tune] Remove queue_trials. (#19472 )	2021-10-22 09:24:54 +01:00
gjoliver	44a4e42172	[rllib] Add entropy_coeff_schedule support for APPO. (#19544 ) * Add entropy_coeff_schedule support for APPO. * lint	2021-10-20 14:18:01 -07:00
Carlo Grisetti	5cee8a1985	[release tests] Switch from yaml.load to yaml.safe_load (#19365 )	2021-10-13 17:27:25 -07:00
Antoine Galataud	edb338ff7c	[RLlib] Check `training_enabled` on PolicyServer (#19007 )	2021-10-12 16:21:02 +02:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
Sven Mika	bd2d2079d2	[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf\|torch]) (#19269 )	2021-10-10 12:19:47 +02:00
Sven Mika	d439fd7f17	[RLlib] TF2/eager memory leak fixes. (#19198 )	2021-10-09 00:11:53 +02:00
Sven Mika	c3e3fc7637	[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. (#19100 )	2021-10-07 23:57:53 +02:00
Sven Mika	fd438d5630	[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. (#19133 )	2021-10-07 22:39:21 +02:00
Sven Mika	1f0646f658	[RLlib] Issue 18418: SAC w/ dict space not working. (#19101 )	2021-10-06 09:05:50 +02:00
Sven Mika	b4300dd532	[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937 )	2021-10-04 13:29:00 +02:00
Sven Mika	73f5c4039b	[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035 )	2021-10-04 13:23:51 +02:00
Jiajun Yao	7588bfd315	[Lint] Add flake8-bugbear (#19053 ) * Add flake8-bugbear * Add flake8-bugbear	2021-10-03 23:24:11 -07:00
Sven Mika	16ad46a654	[RLlib] Fix broken test_r2d2.py. (#19017 )	2021-09-30 21:19:37 +02:00
Sven Mika	ac3371a148	[RLlib] Discussion 3644: Fix bug for complex obs spaces containing `Box([2D shape])` and discrete component. (#18917 )	2021-09-30 16:39:38 +02:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Sven Mika	828f5d26b7	[RLlib] Custom view requirements (e.g. for prev-n-obs) work with `compute_single_action` and `compute_actions_from_input_dict`. (#18921 )	2021-09-30 15:03:37 +02:00
Avnish Narayan	6dc1a6b72f	[RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-30 10:56:22 +02:00
Sven Mika	05a55a9335	[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). (#18942 )	2021-09-30 08:30:20 +02:00
Sven Mika	9c9b482661	[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939 )	2021-09-29 21:31:34 +02:00
Sven Mika	b99943806e	[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971 )	2021-09-29 21:30:04 +02:00
mvindiola1	62f5da0b65	[RLlib] Add unit tests for updating episode data in base_env (#17137 )	2021-09-24 16:08:11 +02:00
Julius Frost	8b8b447fd7	[RLlib] Fix `train.py` input config patching (#18747 )	2021-09-24 14:41:33 +02:00
o0olele	ff6730f903	[RLlib] Attention Nets + MultiDiscrete spaces: Fix range() takes no keyword args error! (#17502 )	2021-09-24 13:43:58 +02:00
Sven Mika	61a1274619	[RLlib] No Preprocessors (part 2). (#18468 )	2021-09-23 12:56:45 +02:00
Sven Mika	a2a077b874	[RLlib] Faster remote worker space inference (don't infer if not required). (#18805 )	2021-09-23 10:54:37 +02:00
Sven Mika	a96dbd885b	[RLlib] Reinstate trajectory view API tests. (#18809 )	2021-09-23 08:31:51 +02:00
Sven Mika	93208bb087	[RLlib] Increase size of (very flakey) action_masking example script test. (#18816 )	2021-09-22 21:48:01 +02:00
Sven Mika	698b4eeed3	[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669 )	2021-09-21 22:00:14 +02:00
Sven Mika	e6aae61487	[RLlib; testing] Fix bug in stress tests not handling >1 trials per experiment (due to grid-search in IMPALA stress tests). (#18705 )	2021-09-20 15:31:57 +02:00
Sven Mika	fd13bac9b3	[RLlib] Add `worker` arg (optional) to `policy_mapping_fn`. (#18184 )	2021-09-17 12:07:11 +02:00
Sven Mika	ba1c489b79	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
Sven Mika	8a72824c63	[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591 )	2021-09-15 22:16:48 +02:00
Sven Mika	8a00154038	[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544 )	2021-09-15 08:46:37 +02:00

1 2 3 4 5 ...

991 commits