ray/python/ray/rllib/train.py

#!/usr/bin/env python

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import json
import os
import pprint
import sys

import ray
import ray.rllib.ppo as ppo
import ray.rllib.es as es
import ray.rllib.dqn as dqn
import ray.rllib.a3c as a3c

parser = argparse.ArgumentParser(
    description=("Train a reinforcement learning agent."))
parser.add_argument("--redis-address", default=None, type=str,
                    help="The Redis address of the cluster.")
parser.add_argument("--env", required=True, type=str,
                    help="The gym environment to use.")
parser.add_argument("--alg", required=True, type=str,
                    help="The reinforcement learning algorithm to use.")
parser.add_argument("--num-iterations", default=sys.maxsize, type=int,
                    help="The number of training iterations to run.")
parser.add_argument("--config", default="{}", type=str,
                    help="The configuration options of the algorithm.")
parser.add_argument("--upload-dir", default="file:///tmp/ray", type=str,
                    help="Where the traces are stored.")
parser.add_argument("--checkpoint-freq", default=sys.maxsize, type=int,
                    help="How many iterations between checkpoints.")
parser.add_argument("--restore", default="", type=str,
                    help="If specified, restores state from this checkpoint.")


if __name__ == "__main__":
    args = parser.parse_args()
    json_config = json.loads(args.config)

    ray.init(redis_address=args.redis_address)

    def _check_and_update(config, json):
        for k in json.keys():
            if k not in config:
                raise Exception(
                    "Unknown model config `{}`, all model configs: {}".format(
                        k, config.keys()))
        config.update(json)

    env_name = args.env
    if args.alg == "PPO":
        config = ppo.DEFAULT_CONFIG.copy()
        _check_and_update(config, json_config)
        alg = ppo.PPOAgent(
            env_name, config, upload_dir=args.upload_dir)
    elif args.alg == "ES":
        config = es.DEFAULT_CONFIG.copy()
        _check_and_update(config, json_config)
        alg = es.ESAgent(
            env_name, config, upload_dir=args.upload_dir)
    elif args.alg == "DQN":
        config = dqn.DEFAULT_CONFIG.copy()
        _check_and_update(config, json_config)
        alg = dqn.DQNAgent(
            env_name, config, upload_dir=args.upload_dir)
    elif args.alg == "A3C":
        config = a3c.DEFAULT_CONFIG.copy()
        _check_and_update(config, json_config)
        alg = a3c.A3CAgent(
            env_name, config, upload_dir=args.upload_dir)
    else:
        assert False, ("Unknown algorithm, check --alg argument. Valid "
                       "choices are PPO, ES, DQN and A3C.")

    result_logger = ray.rllib.common.RLLibLogger(
        os.path.join(alg.logdir, "result.json"))

    if args.restore:
        alg.restore(args.restore)

    for i in range(args.num_iterations):
        result = alg.train()

        # We need to use a custom json serializer class so that NaNs get
        # encoded as null as required by Athena.
        json.dump(result._asdict(), result_logger,
                  cls=ray.rllib.common.RLLibEncoder)
        result_logger.write("\n")

        print("== Iteration {} ==".format(alg.iteration))
        pprint.pprint(result._asdict())

        if (i + 1) % args.checkpoint_freq == 0:
            print("checkpoint path: {}".format(alg.save()))
[rllib] unify writing performance metrics and make it queryable (#708) * write config to s3 * add train file * write performance to S3 * writing needs to be fixed, replacing result.json at the moment * update * add experiment_id * more logging and example queries * update * add info * fill in other algorithms * fix linting * convert readme to rst * fixes * simplejson -> json * make files executable * edit README.rst * unify storing logs in S3 and on local filesystem * use 'info' entry in TrainingResult for algorithm specific info * don't install smart_open with ray * fixes * linting fixes 2017-07-10 23:36:14 +00:00			`#!/usr/bin/env python`

			`from __future__ import absolute_import`
			`from __future__ import division`
			`from __future__ import print_function`

			`import argparse`
			`import json`
			`import os`
[rllib] Make sure to always record stats like time elapsed, timesteps (#965) * always record training stats * fix * comments * revert assert * nan * fix 2017-09-12 14:28:16 -07:00			`import pprint`
[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00			`import sys`
[rllib] unify writing performance metrics and make it queryable (#708) * write config to s3 * add train file * write performance to S3 * writing needs to be fixed, replacing result.json at the moment * update * add experiment_id * more logging and example queries * update * add info * fill in other algorithms * fix linting * convert readme to rst * fixes * simplejson -> json * make files executable * edit README.rst * unify storing logs in S3 and on local filesystem * use 'info' entry in TrainingResult for algorithm specific info * don't install smart_open with ray * fixes * linting fixes 2017-07-10 23:36:14 +00:00
			`import ray`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`import ray.rllib.ppo as ppo`
			`import ray.rllib.es as es`
[rllib] unify writing performance metrics and make it queryable (#708) * write config to s3 * add train file * write performance to S3 * writing needs to be fixed, replacing result.json at the moment * update * add experiment_id * more logging and example queries * update * add info * fill in other algorithms * fix linting * convert readme to rst * fixes * simplejson -> json * make files executable * edit README.rst * unify storing logs in S3 and on local filesystem * use 'info' entry in TrainingResult for algorithm specific info * don't install smart_open with ray * fixes * linting fixes 2017-07-10 23:36:14 +00:00			`import ray.rllib.dqn as dqn`
			`import ray.rllib.a3c as a3c`

			`parser = argparse.ArgumentParser(`
			`description=("Train a reinforcement learning agent."))`
[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00			`parser.add_argument("--redis-address", default=None, type=str,`
			`help="The Redis address of the cluster.")`
			`parser.add_argument("--env", required=True, type=str,`
			`help="The gym environment to use.")`
			`parser.add_argument("--alg", required=True, type=str,`
			`help="The reinforcement learning algorithm to use.")`
			`parser.add_argument("--num-iterations", default=sys.maxsize, type=int,`
			`help="The number of training iterations to run.")`
			`parser.add_argument("--config", default="{}", type=str,`
			`help="The configuration options of the algorithm.")`
			`parser.add_argument("--upload-dir", default="file:///tmp/ray", type=str,`
			`help="Where the traces are stored.")`
[rllib] Full checkpoint/restore for all algorithms (#875) * wip * working for all but dqn * update * add train * rename * update * Update test 2017-08-27 18:56:52 -07:00			`parser.add_argument("--checkpoint-freq", default=sys.maxsize, type=int,`
			`help="How many iterations between checkpoints.")`
			`parser.add_argument("--restore", default="", type=str,`
			`help="If specified, restores state from this checkpoint.")`
[rllib] unify writing performance metrics and make it queryable (#708) * write config to s3 * add train file * write performance to S3 * writing needs to be fixed, replacing result.json at the moment * update * add experiment_id * more logging and example queries * update * add info * fill in other algorithms * fix linting * convert readme to rst * fixes * simplejson -> json * make files executable * edit README.rst * unify storing logs in S3 and on local filesystem * use 'info' entry in TrainingResult for algorithm specific info * don't install smart_open with ray * fixes * linting fixes 2017-07-10 23:36:14 +00:00

			`if __name__ == "__main__":`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`args = parser.parse_args()`
[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00			`json_config = json.loads(args.config)`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00
[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00			`ray.init(redis_address=args.redis_address)`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00
[rllib] Add downscale and frameskip options for Montezumas (#908) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix 2017-09-02 17:20:56 -07:00			`def _check_and_update(config, json):`
			`for k in json.keys():`
			`if k not in config:`
			`raise Exception(`
			"Unknown model config `{}`, all model configs: {}".format(
			`k, config.keys()))`
			`config.update(json)`

Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`env_name = args.env`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`if args.alg == "PPO":`
			`config = ppo.DEFAULT_CONFIG.copy()`
[rllib] Add downscale and frameskip options for Montezumas (#908) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix 2017-09-02 17:20:56 -07:00			`_check_and_update(config, json_config)`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`alg = ppo.PPOAgent(`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`env_name, config, upload_dir=args.upload_dir)`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`elif args.alg == "ES":`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`config = es.DEFAULT_CONFIG.copy()`
[rllib] Add downscale and frameskip options for Montezumas (#908) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix 2017-09-02 17:20:56 -07:00			`_check_and_update(config, json_config)`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`alg = es.ESAgent(`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`env_name, config, upload_dir=args.upload_dir)`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`elif args.alg == "DQN":`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`config = dqn.DEFAULT_CONFIG.copy()`
[rllib] Add downscale and frameskip options for Montezumas (#908) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix 2017-09-02 17:20:56 -07:00			`_check_and_update(config, json_config)`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`alg = dqn.DQNAgent(`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`env_name, config, upload_dir=args.upload_dir)`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`elif args.alg == "A3C":`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`config = a3c.DEFAULT_CONFIG.copy()`
[rllib] Add downscale and frameskip options for Montezumas (#908) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix 2017-09-02 17:20:56 -07:00			`_check_and_update(config, json_config)`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`alg = a3c.A3CAgent(`
[rllib] Expose algorithm parameters and tune policy gradient parameters for humanoid (#753) * parameters for humanoid * fix 2017-07-19 23:45:05 +00:00			`env_name, config, upload_dir=args.upload_dir)`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`else:`
			`assert False, ("Unknown algorithm, check --alg argument. Valid "`
[rllib] Rename algorithms (#890) * rename algorithms * fix * fix jenkins test * fix documentation * fix 2017-08-29 16:56:42 -07:00			`"choices are PPO, ES, DQN and A3C.")`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00
			`result_logger = ray.rllib.common.RLLibLogger(`
			`os.path.join(alg.logdir, "result.json"))`

[rllib] Full checkpoint/restore for all algorithms (#875) * wip * working for all but dqn * update * add train * rename * update * Update test 2017-08-27 18:56:52 -07:00			`if args.restore:`
			`alg.restore(args.restore)`

[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00			`for i in range(args.num_iterations):`
Switch Python indentation from 2 spaces to 4 spaces. (#726) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes. 2017-07-13 14:53:57 -07:00			`result = alg.train()`

			`# We need to use a custom json serializer class so that NaNs get`
			`# encoded as null as required by Athena.`
			`json.dump(result._asdict(), result_logger,`
			`cls=ray.rllib.common.RLLibEncoder)`
			`result_logger.write("\n")`
[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example 2017-08-07 19:05:48 -07:00
[rllib] Make sure to always record stats like time elapsed, timesteps (#965) * always record training stats * fix * comments * revert assert * nan * fix 2017-09-12 14:28:16 -07:00			`print("== Iteration {} ==".format(alg.iteration))`
			`pprint.pprint(result._asdict())`
[rllib] Full checkpoint/restore for all algorithms (#875) * wip * working for all but dqn * update * add train * rename * update * Update test 2017-08-27 18:56:52 -07:00
			`if (i + 1) % args.checkpoint_freq == 0:`
			`print("checkpoint path: {}".format(alg.save()))`