hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
Sven Mika	b10d5533be	[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452 )	2022-01-10 11:19:40 +01:00
Sven Mika	b4790900f5	[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725 )	2021-12-04 22:05:26 +01:00
Sven Mika	9d2fe5756c	[RLlib] Trainer sub-class for APPO (instead of using `build_trainer()`). (#20424 )	2021-11-22 22:14:21 +01:00
Sven Mika	56619b955e	[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250 )	2021-11-17 21:40:16 +01:00
Kai Fricke	05d21497db	[rllib/tune] Fix durable trainable in trainer template, add release test (#20422 )	2021-11-16 20:52:42 +00:00
Kai Fricke	3e6ba5d6d2	Revert "Revert [RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`." (#20285 ) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit `246787cdd9`. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-16 12:26:47 +01:00
Kai Fricke	246787cdd9	Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )" (#20284 ) This reverts commit `6f85af435f`.	2021-11-12 13:09:43 +00:00
Sven Mika	6f85af435f	[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )	2021-11-11 12:16:20 +01:00
Sven Mika	bab9c0f670	[RLlib; Docs overhaul] Redo: Docstring cleanup: Trainer, trainer_template, Callbacks."" (#19830 )	2021-11-01 21:45:11 +01:00
Sven Mika	4a82d3ea6c	Revert "[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )" (#19806 ) This reverts commit `80eeb13175`.	2021-10-27 23:30:07 +02:00
Sven Mika	80eeb13175	[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758 )	2021-10-27 19:15:35 +02:00
gjoliver	99a0088233	[RLlib] Unify the way we create local replay buffer for all agents (#19627 ) * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field.	2021-10-26 20:56:02 +02:00
Sven Mika	56f142cac1	[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380 )	2021-09-07 08:08:37 +02:00
Sven Mika	ba58f5edb1	[RLlib] Strictly run `evaluation_num_episodes` episodes each evaluation run (no matter the other eval config settings). (#18335 )	2021-09-05 15:37:05 +02:00
Sven Mika	4888d7c9af	[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999 )	2021-08-31 12:21:49 +02:00
Julius Frost	0b1b6222bc	[rllib] Add merge_trainer_config arguments to trainer template (#17160 )	2021-07-21 15:43:06 -07:00
Sven Mika	5a313ba3d6	[RLlib] Refactor: All tf static graph code should reside inside Policy class. (#17169 )	2021-07-20 14:58:13 -04:00
Sven Mika	169ddabae7	[RLlib] Issue 15973: Trainer.with_updates(validate_config=...) behaves confusingly. (#16429 )	2021-06-19 22:42:00 +02:00
Sven Mika	3d4dc60e2e	[RLlib] CQL iteration count fixes: Remove dummy buffer and unnecessary store op from exec_plan. (#16332 )	2021-06-10 07:49:17 +02:00
Sven Mika	d89fb82bfb	[RLlib] Add simple curriculum learning API and example script. (#15740 )	2021-05-16 17:35:10 +02:00
Sven Mika	16ddab49f5	[RLlib] Trainer._evaluate -> Trainer.evaluate; Also make evaluation possible w/o evaluation worker set. (#15591 )	2021-05-12 12:16:00 +02:00
Sven Mika	5254d2fb36	[RLlib] Support parallelizing evaluation and training (optional). (#15040 )	2021-04-13 09:53:35 +02:00
Sven Mika	732197e23a	[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )	2021-03-08 15:41:27 +01:00
Maltimore	b4702de1c2	[RLlib] move evaluation to trainer.step() such that the result is properly logged (#12708 )	2021-01-25 12:56:00 +01:00
Sven Mika	ea25482f6a	WIP. (#12706 )	2020-12-09 11:49:21 -08:00
Sven Mika	e40b14d255	[RLlib] Batch-size for truncate_episode batch_mode should be confgurable in agent-steps (rather than env-steps), if needed. (#12420 )	2020-12-08 16:41:45 -08:00
Sven Mika	ce96b03b07	[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033 )	2020-10-06 20:28:16 +02:00
Sven Mika	805dad3bc4	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
Sven Mika	4b278c36fc	[RLlib] Behavioral Cloning (from MARWIL). (#10619 )	2020-09-09 17:33:21 +02:00
Sven Mika	ef18893fb5	[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420 )	2020-09-02 14:03:01 +02:00
Sven Mika	d14b501692	[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115 )	2020-08-20 17:05:57 +02:00
Sven Mika	2256047876	[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114 )	2020-08-15 13:24:22 +02:00
Eric Liang	4b62a888cc	[rllib] Remove deprecated policy optimizer package. (#9262 )	2020-07-02 14:39:40 -07:00
Richard Liaw	d35f0e40d0	[tune] Use public methods for trainable (#9184 )	2020-07-01 11:00:00 -07:00
Sven Mika	4fd8977eaf	[RLlib] Minor cleanup in preparation to tf2.x support. (#9130 ) * WIP. * Fixes. * LINT. * Fixes. * Fixes and LINT. * WIP.	2020-06-25 19:01:32 +02:00
Eric Liang	1e0e1a45e6	[rllib] Add type annotations for evaluation/, env/ packages (#9003 )	2020-06-19 13:09:05 -07:00
Eric Liang	9a83908c46	[rllib] Deprecate policy optimizers (#8345 )	2020-05-21 10:16:18 -07:00
Eric Liang	baadbdf8d4	[rllib] Execute PPO using training workflow (#8206 ) * wip * add kl * kl * works now * doc update * reorg * add ddppo * add stats * fix fetch * comment * fix learner stat regression * test fixes * fix test	2020-04-30 01:18:09 -07:00
Eric Liang	2298f6fb40	[rllib] Port DQN/Ape-X to training workflow api (#8077 )	2020-04-23 12:39:19 -07:00
Sven Mika	428516056a	[RLlib] SAC Torch (incl. Atari learning) (#7984 ) * Policy-classes cleanup and torch/tf unification. - Make Policy abstract. - Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch). - Move some methods and vars to base Policy (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more. * Fix `clip_action` import from Policy (should probably be moved into utils altogether). * - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy). - Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces). * Add `config` to c'tor call to TFPolicy. * Add missing `config` to c'tor call to TFPolicy in marvil_policy.py. * Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract). * Fix LINT errors in Policy classes. * Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py. * policy.py LINT errors. * Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases). * policy.py - Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented). - Fix docstring of `num_state_tensors`. * Make QMIX torch Policy a child of TorchPolicy (instead of Policy). * QMixPolicy add empty implementations of abstract Policy methods. * Store Policy's config in self.config in base Policy c'tor. * - Make only compute_actions in base Policy's an abstractmethod and provide pass implementation to all other methods if not defined. - Fix state_batches=None (most Policies don't have internal states). * Cartpole tf learning. * Cartpole tf AND torch learning (in ~ same ts). * Cartpole tf AND torch learning (in ~ same ts). 2 * Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3 * Cartpole tf AND torch learning (in ~ same ts). 4 * Cartpole tf AND torch learning (in ~ same ts). 5 * Cartpole tf AND torch learning (in ~ same ts). 6 * Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning. * WIP. * WIP. * SAC torch learning Pendulum. * WIP. * SAC torch and tf learning Pendulum and Cartpole after cleanup. * WIP. * LINT. * LINT. * SAC: Move policy.target_model to policy.device as well. * Fixes and cleanup. * Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default). * Fixes and LINT. * Fixes and LINT. * Fix and LINT. * WIP. * Test fixes and LINT. * Fixes and LINT. Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>	2020-04-15 13:25:16 +02:00
Sven Mika	bcf963a53b	[RLlib] Bug default policy overrides torch policy. (#7756 ) * Rollback. * Bug fix!	2020-03-26 10:03:20 -07:00
Eric Liang	288933ec6b	[rllib] Fix shared metrics context in parallel iterators (#7666 ) * debug * build * update * wip * wpi * update * recurisve sync * comment * stream * fix * Update .travis.yml	2020-03-22 14:15:01 -07:00
Eric Liang	c3a8ba399f	[rllib] Enable distributed exec api for A2C, A3C, PG by default (#7580 )	2020-03-13 18:48:41 -07:00
Eric Liang	f5d12a958b	[rllib] Port Ape-X to distributed execution API (#7497 )	2020-03-12 00:54:08 -07:00
Eric Liang	a644060daa	[rllib] First pass at pipeline implementation of DQN (#7433 ) * wip iters * add test * speed up * update docs * document it * support serial sampling * add test * spacing * annotate it * update * rename to pipeline * comment * iter2 wip * update * update * context test * update * fix * fix * a3c pipeline * doc * update * move timer * comment * add piepline test * fix * clean up * document * iter s * wip dqn * wip * wip * metrics * metrics rename * metrics ctx * wip * constants * add todo * suppport .union * wip * support union * remove prints * add todo * remove auto timer * fix up * fix pipeline test * typing * fix breakage * remove bad assert * wip * fix multiagent example * fixapply * update a3c * remove a2c pl * 0 workers * wip * wip * share metrics * wip * wip * doc * fix weight sync and global var updates * mode * fix * fix * doc * fix	2020-03-07 14:47:58 -08:00
Eric Liang	c38224d8e5	[RLlib] Issue 7438 evaluation not working in pytorch. (#7443 )	2020-03-04 12:53:04 -08:00
Eric Liang	0f88444686	[rllib] Support multi-agent training in pipeline impls, add easy flag to enable (#7338 )	2020-03-02 15:16:37 -08:00
Eric Liang	3c6b94f3f5	[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C (#7299 )	2020-02-28 16:44:17 -08:00
Eric Liang	46af992efd	[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) (#7213 )	2020-02-19 16:07:37 -08:00

1 2

55 commits