ray/rllib/examples/custom_train_fn.py

"""Example of a custom training workflow. Run this for a demo.

This example shows:
  - using Tune trainable functions to implement custom training workflows

You can visualize experiment results in ~/ray_results using TensorBoard.
"""
import argparse
import os

import ray
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer

parser = argparse.ArgumentParser()
parser.add_argument(
    "--framework",
    choices=["tf", "tf2", "tfe", "torch"],
    default="tf",
    help="The DL framework specifier.",
)


def my_train_fn(config, reporter):
    iterations = config.pop("train-iterations", 10)

    # Train for n iterations with high LR
    agent1 = PPOTrainer(env="CartPole-v0", config=config)
    for _ in range(iterations):
        result = agent1.train()
        result["phase"] = 1
        reporter(**result)
        phase1_time = result["timesteps_total"]
    state = agent1.save()
    agent1.stop()

    # Train for n iterations with low LR
    config["lr"] = 0.0001
    agent2 = PPOTrainer(env="CartPole-v0", config=config)
    agent2.restore(state)
    for _ in range(iterations):
        result = agent2.train()
        result["phase"] = 2
        result["timesteps_total"] += phase1_time  # keep time moving forward
        reporter(**result)
    agent2.stop()


if __name__ == "__main__":
    ray.init()
    args = parser.parse_args()
    config = {
        # Special flag signalling `my_train_fn` how many iters to do.
        "train-iterations": 2,
        "lr": 0.01,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
        "num_workers": 0,
        "framework": args.framework,
    }
    resources = PPOTrainer.default_resource_request(config)
    tune.run(my_train_fn, resources_per_trial=resources, config=config)
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00			`"""Example of a custom training workflow. Run this for a demo.`

			`This example shows:`
			`- using Tune trainable functions to implement custom training workflows`

			`You can visualize experiment results in ~/ray_results using TensorBoard.`
			`"""`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`import argparse`
[RLlib] Fix all example scripts to run on GPUs. (#11105) 2020-10-02 23:07:44 +02:00			`import os`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00
			`import ray`
[rllib] Switch to tune.run() instead of run_experiments() (#4515) 2019-03-30 14:07:50 -07:00			`from ray import tune`
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00			`from ray.rllib.agents.ppo import PPOTrainer`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`parser = argparse.ArgumentParser()`
[RLlib] Examples scripts add argparse help and replace `--torch` with `--framework`. (#15832) 2021-05-18 13:18:12 +02:00			`parser.add_argument(`
			`"--framework",`
			`choices=["tf", "tf2", "tfe", "torch"],`
			`default="tf",`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`help="The DL framework specifier.",`
			`)`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00
			`def my_train_fn(config, reporter):`
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00			`iterations = config.pop("train-iterations", 10)`

[RLlib] Example script for restoring 1 agent (out of n) from a checkpoint (multi-agent). (#15540) 2021-05-10 16:09:05 +02:00			`# Train for n iterations with high LR`
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00			`agent1 = PPOTrainer(env="CartPole-v0", config=config)`
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00			`for _ in range(iterations):`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00			`result = agent1.train()`
			`result["phase"] = 1`
			`reporter(**result)`
			`phase1_time = result["timesteps_total"]`
			`state = agent1.save()`
			`agent1.stop()`

[RLlib] Example script for restoring 1 agent (out of n) from a checkpoint (multi-agent). (#15540) 2021-05-10 16:09:05 +02:00			`# Train for n iterations with low LR`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00			`config["lr"] = 0.0001`
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00			`agent2 = PPOTrainer(env="CartPole-v0", config=config)`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00			`agent2.restore(state)`
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00			`for _ in range(iterations):`
[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix 2019-01-29 21:06:09 -08:00			`result = agent2.train()`
			`result["phase"] = 2`
			`result["timesteps_total"] += phase1_time # keep time moving forward`
			`reporter(**result)`
			`agent2.stop()`


			`if __name__ == "__main__":`
			`ray.init()`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`args = parser.parse_args()`
[tune] Disallow setting resources_per_trial when it is already configured (#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests 2019-06-03 06:47:39 +08:00			`config = {`
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00			# Special flag signalling `my_train_fn` how many iters to do.
			`"train-iterations": 2,`
[tune] Disallow setting resources_per_trial when it is already configured (#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests 2019-06-03 06:47:39 +08:00			`"lr": 0.01,`
[RLlib] Fix all example scripts to run on GPUs. (#11105) 2020-10-02 23:07:44 +02:00			# Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
			`"num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),`
[tune] Disallow setting resources_per_trial when it is already configured (#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests 2019-06-03 06:47:39 +08:00			`"num_workers": 0,`
[RLlib] Examples scripts add argparse help and replace `--torch` with `--framework`. (#15832) 2021-05-18 13:18:12 +02:00			`"framework": args.framework,`
[tune] Disallow setting resources_per_trial when it is already configured (#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests 2019-06-03 06:47:39 +08:00			`}`
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00			`resources = PPOTrainer.default_resource_request(config)`
[tune] Disallow setting resources_per_trial when it is already configured (#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests 2019-06-03 06:47:39 +08:00			`tune.run(my_train_fn, resources_per_trial=resources, config=config)`