ray/rllib/agents/marwil/bc.py

from ray.rllib.agents.marwil.marwil import (
    MARWILTrainer,
    DEFAULT_CONFIG as MARWIL_CONFIG,
)
from ray.rllib.utils.annotations import override
from ray.rllib.utils.typing import TrainerConfigDict

# fmt: off
# __sphinx_doc_begin__
BC_DEFAULT_CONFIG = MARWILTrainer.merge_trainer_configs(
    MARWIL_CONFIG, {
        # No need to calculate advantages (or do anything else with the
        # rewards).
        "beta": 0.0,
        # Advantages (calculated during postprocessing) not important for
        # behavioral cloning.
        "postprocess_inputs": False,
        # No reward estimation.
        "input_evaluation": [],
    })
# __sphinx_doc_end__
# fmt: on


class BCTrainer(MARWILTrainer):
    """Behavioral Cloning (derived from MARWIL).

    Simply uses the MARWIL agent with beta force-set to 0.0.
    """

    @classmethod
    @override(MARWILTrainer)
    def get_default_config(cls) -> TrainerConfigDict:
        return BC_DEFAULT_CONFIG

    @override(MARWILTrainer)
    def validate_config(self, config: TrainerConfigDict) -> None:
        # Call super's validation method.
        super().validate_config(config)

        if config["beta"] != 0.0:
            raise ValueError("For behavioral cloning, `beta` parameter must be 0.0!")
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`from ray.rllib.agents.marwil.marwil import (`
			`MARWILTrainer,`
			`DEFAULT_CONFIG as MARWIL_CONFIG,`
			`)`
[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725) 2021-12-04 22:05:26 +01:00			`from ray.rllib.utils.annotations import override`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00			`from ray.rllib.utils.typing import TrainerConfigDict`

[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00			`# fmt: off`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00			`# __sphinx_doc_begin__`
			`BC_DEFAULT_CONFIG = MARWILTrainer.merge_trainer_configs(`
			`MARWIL_CONFIG, {`
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064) 2020-12-27 09:46:03 -05:00			`# No need to calculate advantages (or do anything else with the`
			`# rewards).`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00			`"beta": 0.0,`
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064) 2020-12-27 09:46:03 -05:00			`# Advantages (calculated during postprocessing) not important for`
			`# behavioral cloning.`
			`"postprocess_inputs": False,`
			`# No reward estimation.`
			`"input_evaluation": [],`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00			`})`
			`# __sphinx_doc_end__`
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00			`# fmt: on`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00

[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725) 2021-12-04 22:05:26 +01:00			`class BCTrainer(MARWILTrainer):`
			`"""Behavioral Cloning (derived from MARWIL).`

			`Simply uses the MARWIL agent with beta force-set to 0.0.`
			`"""`

			`@classmethod`
			`@override(MARWILTrainer)`
			`def get_default_config(cls) -> TrainerConfigDict:`
			`return BC_DEFAULT_CONFIG`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00
[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725) 2021-12-04 22:05:26 +01:00			`@override(MARWILTrainer)`
			`def validate_config(self, config: TrainerConfigDict) -> None:`
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2022-01-10 11:19:40 +01:00			`# Call super's validation method.`
[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725) 2021-12-04 22:05:26 +01:00			`super().validate_config(config)`
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00
[RLlib] Sub-class `Trainer` (instead of `build_trainer()`): All remaining classes; soft-deprecate `build_trainer`. (#20725) 2021-12-04 22:05:26 +01:00			`if config["beta"] != 0.0:`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			raise ValueError("For behavioral cloning, `beta` parameter must be 0.0!")