ray/rllib/contrib/random_agent/random_agent.py

import numpy as np

from ray.rllib.algorithms.algorithm import Algorithm, with_common_config
from ray.rllib.utils.annotations import override
from ray.rllib.utils.typing import AlgorithmConfigDict


# fmt: off
# __sphinx_doc_begin__
class RandomAgent(Algorithm):
    """Algo that produces random actions and never learns."""

    @classmethod
    @override(Algorithm)
    def get_default_config(cls) -> AlgorithmConfigDict:
        return with_common_config({
            "rollouts_per_iteration": 10,
            "framework": "tf",  # not used
        })

    @override(Algorithm)
    def _init(self, config, env_creator):
        self.env = env_creator(config["env_config"])

    @override(Algorithm)
    def step(self):
        rewards = []
        steps = 0
        for _ in range(self.config["rollouts_per_iteration"]):
            obs = self.env.reset()
            done = False
            reward = 0.0
            while not done:
                action = self.env.action_space.sample()
                obs, r, done, info = self.env.step(action)
                reward += r
                steps += 1
            rewards.append(reward)
        return {
            "episode_reward_mean": np.mean(rewards),
            "timesteps_this_iter": steps,
        }
# __sphinx_doc_end__
# FIXME: We switched our code formatter from YAPF to Black. Check if we can enable code
# formatting on this module and update the comment below. See issue #21318.
# don't enable yapf after, it's buggy here


if __name__ == "__main__":
    algo = RandomAgent(
        env="CartPole-v0", config={"rollouts_per_iteration": 10})
    result = algo.train()
    assert result["episode_reward_mean"] > 10, result
    print("Test: OK")
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`import numpy as np`

[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`from ray.rllib.algorithms.algorithm import Algorithm, with_common_config`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`from ray.rllib.utils.annotations import override`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`from ray.rllib.utils.typing import AlgorithmConfigDict`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00

[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00			`# fmt: off`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`# __sphinx_doc_begin__`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`class RandomAgent(Algorithm):`
			`"""Algo that produces random actions and never learns."""`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00
[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). (#20571) 2021-11-23 23:01:05 +01:00			`@classmethod`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`@override(Algorithm)`
			`def get_default_config(cls) -> AlgorithmConfigDict:`
[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). (#20571) 2021-11-23 23:01:05 +01:00			`return with_common_config({`
			`"rollouts_per_iteration": 10,`
			`"framework": "tf", # not used`
			`})`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`@override(Algorithm)`
[rllib] Minor cleanups to TFPolicyGraph: add init args, constants for loss inputs (#4478) 2019-03-29 12:44:23 -07:00			`def _init(self, config, env_creator):`
			`self.env = env_creator(config["env_config"])`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`@override(Algorithm)`
[tune] Use public methods for trainable (#9184) 2020-07-01 11:00:00 -07:00			`def step(self):`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`rewards = []`
			`steps = 0`
			`for _ in range(self.config["rollouts_per_iteration"]):`
			`obs = self.env.reset()`
			`done = False`
			`reward = 0.0`
			`while not done:`
			`action = self.env.action_space.sample()`
			`obs, r, done, info = self.env.step(action)`
			`reward += r`
			`steps += 1`
			`rewards.append(reward)`
			`return {`
			`"episode_reward_mean": np.mean(rewards),`
			`"timesteps_this_iter": steps,`
			`}`
			`# __sphinx_doc_end__`
Comment `fmt: off` annotations (#21984) Code formatting is disabled in several modules with the explanation > [The module] ignores yapf because yapf doesn't allow comments right after code blocks, but we put comments right after code blocks to prevent large white spaces in the documentation. Since we no longer use YAPF, it may be possible to re-enable code formatting on these modules. I've added "FIXME" comments requesting developers to check whether code formatter appeasements are still necessary. 2022-02-09 22:12:11 -08:00			`# FIXME: We switched our code formatter from YAPF to Black. Check if we can enable code`
			`# formatting on this module and update the comment below. See issue #21318.`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`# don't enable yapf after, it's buggy here`


			`if __name__ == "__main__":`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`algo = RandomAgent(`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`env="CartPole-v0", config={"rollouts_per_iteration": 10})`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`result = algo.train()`
[rllib] [rfc] add contrib module and guideline for merging (#3565) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier. 2018-12-21 03:44:34 +09:00			`assert result["episode_reward_mean"] > 10, result`
			`print("Test: OK")`