ray/rllib/utils/schedules/polynomial_schedule.py

from typing import Optional

from ray.rllib.utils.annotations import override, PublicAPI
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.schedules.schedule import Schedule
from ray.rllib.utils.typing import TensorType

tf1, tf, tfv = try_import_tf()
torch, _ = try_import_torch()


@PublicAPI
class PolynomialSchedule(Schedule):
    """Polynomial interpolation between `initial_p` and `final_p`.

    Over `schedule_timesteps`. After this many time steps, always returns
    `final_p`.
    """

    def __init__(
        self,
        schedule_timesteps: int,
        final_p: float,
        framework: Optional[str],
        initial_p: float = 1.0,
        power: float = 2.0,
    ):
        """Initializes a PolynomialSchedule instance.

        Args:
            schedule_timesteps: Number of time steps for which to
                linearly anneal initial_p to final_p
            final_p: Final output value.
            framework: The framework descriptor string, e.g. "tf",
                "torch", or None.
            initial_p: Initial output value.
            power: The exponent to use (default: quadratic).
        """
        super().__init__(framework=framework)
        assert schedule_timesteps > 0
        self.schedule_timesteps = schedule_timesteps
        self.final_p = final_p
        self.initial_p = initial_p
        self.power = power

    @override(Schedule)
    def _value(self, t: TensorType) -> TensorType:
        """Returns the result of:
        final_p + (initial_p - final_p) * (1 - `t`/t_max) ** power
        """
        if self.framework == "torch" and torch and isinstance(t, torch.Tensor):
            t = t.float()
        t = min(t, self.schedule_timesteps)
        return (
            self.final_p
            + (self.initial_p - self.final_p)
            * (1.0 - (t / self.schedule_timesteps)) ** self.power
        )

    @override(Schedule)
    def _tf_value_op(self, t: TensorType) -> TensorType:
        t = tf.math.minimum(t, self.schedule_timesteps)
        return (
            self.final_p
            + (self.initial_p - self.final_p)
            * (1.0 - (t / self.schedule_timesteps)) ** self.power
        )
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`from typing import Optional`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00
[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`from ray.rllib.utils.annotations import override, PublicAPI`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`from ray.rllib.utils.framework import try_import_tf, try_import_torch`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`from ray.rllib.utils.schedules.schedule import Schedule`
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114) 2020-08-15 13:24:22 +02:00			`from ray.rllib.utils.typing import TensorType`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`tf1, tf, tfv = try_import_tf()`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`torch, _ = try_import_torch()`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00

[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`@PublicAPI`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`class PolynomialSchedule(Schedule):`
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			"""Polynomial interpolation between `initial_p` and `final_p`.

			Over `schedule_timesteps`. After this many time steps, always returns
			`final_p`.
			`"""`

[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`def __init__(`
			`self,`
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`schedule_timesteps: int,`
			`final_p: float,`
			`framework: Optional[str],`
			`initial_p: float = 1.0,`
			`power: float = 2.0,`
			`):`
			`"""Initializes a PolynomialSchedule instance.`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`Args:`
			`schedule_timesteps: Number of time steps for which to`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`linearly anneal initial_p to final_p`
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`final_p: Final output value.`
			`framework: The framework descriptor string, e.g. "tf",`
			`"torch", or None.`
			`initial_p: Initial output value.`
			`power: The exponent to use (default: quadratic).`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`"""`
			`super().__init__(framework=framework)`
			`assert schedule_timesteps > 0`
			`self.schedule_timesteps = schedule_timesteps`
			`self.final_p = final_p`
			`self.initial_p = initial_p`
			`self.power = power`

[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`@override(Schedule)`
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`def _value(self, t: TensorType) -> TensorType:`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`"""Returns the result of:`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			final_p + (initial_p - final_p) * (1 - `t`/t_max) ** power
			`"""`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`if self.framework == "torch" and torch and isinstance(t, torch.Tensor):`
			`t = t.float()`
fixing polynomial schedule horizon (#7795) 2020-05-27 04:59:28 -04:00			`t = min(t, self.schedule_timesteps)`
[RLlib] Schedule-classes multi-framework support. (#6926) 2020-01-28 20:07:55 +01:00			`return (`
			`self.final_p`
			`+ (self.initial_p - self.final_p)`
			`* (1.0 - (t / self.schedule_timesteps)) ** self.power`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`)`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00
			`@override(Schedule)`
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00			`def _tf_value_op(self, t: TensorType) -> TensorType:`
[RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00			`t = tf.math.minimum(t, self.schedule_timesteps)`
			`return (`
			`self.final_p`
			`+ (self.initial_p - self.final_p)`
			`* (1.0 - (t / self.schedule_timesteps)) ** self.power`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`)`