ray/rllib/models/tf/misc.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
from ray.rllib.utils import try_import_tf

tf = try_import_tf()


def normc_initializer(std=1.0):
    def _initializer(shape, dtype=None, partition_info=None):
        out = np.random.randn(*shape).astype(np.float32)
        out *= std / np.sqrt(np.square(out).sum(axis=0, keepdims=True))
        return tf.constant(out)

    return _initializer


def get_activation_fn(name):
    return getattr(tf.nn, name)


def conv2d(x,
           num_filters,
           name,
           filter_size=(3, 3),
           stride=(1, 1),
           pad="SAME",
           dtype=None,
           collections=None):
    if dtype is None:
        dtype = tf.float32

    with tf.variable_scope(name):
        stride_shape = [1, stride[0], stride[1], 1]
        filter_shape = [
            filter_size[0], filter_size[1],
            int(x.get_shape()[3]), num_filters
        ]

        # There are "num input feature maps * filter height * filter width"
        # inputs to each hidden unit.
        fan_in = np.prod(filter_shape[:3])
        # Each unit in the lower layer receives a gradient from: "num output
        # feature maps * filter height * filter width" / pooling size.
        fan_out = np.prod(filter_shape[:2]) * num_filters
        # Initialize weights with random weights.
        w_bound = np.sqrt(6 / (fan_in + fan_out))

        w = tf.get_variable(
            "W",
            filter_shape,
            dtype,
            tf.random_uniform_initializer(-w_bound, w_bound),
            collections=collections)
        b = tf.get_variable(
            "b", [1, 1, 1, num_filters],
            initializer=tf.constant_initializer(0.0),
            collections=collections)
        return tf.nn.conv2d(x, w, stride_shape, pad) + b


def linear(x, size, name, initializer=None, bias_init=0):
    w = tf.get_variable(
        name + "/w", [x.get_shape()[1], size], initializer=initializer)
    b = tf.get_variable(
        name + "/b", [size], initializer=tf.constant_initializer(bias_init))
    return tf.matmul(x, w) + b


def flatten(x):
    return tf.reshape(x, [-1, np.prod(x.get_shape().as_list()[1:])])
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`from __future__ import absolute_import`
			`from __future__ import division`
			`from __future__ import print_function`

			`import numpy as np`
[rllib] Remove dependency on TensorFlow (#4764) * remove hard tf dep * add test * comment fix * fix test 2019-05-10 20:36:18 -07:00			`from ray.rllib.utils import try_import_tf`

			`tf = try_import_tf()`
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00

			`def normc_initializer(std=1.0):`
			`def _initializer(shape, dtype=None, partition_info=None):`
			`out = np.random.randn(*shape).astype(np.float32)`
			`out *= std / np.sqrt(np.square(out).sum(axis=0, keepdims=True))`
			`return tf.constant(out)`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`return _initializer`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00

[rllib] General RNN support (#2299) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * wip * wip * cast * wip * works * fix a3c * works * lstm util test * doc * clean up * update * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * clarify * copy * async sa * fix * comments * fix a3c conf * tune lstm * fix reshape * fix * back to 16 * tuned a3c update * update * tuned * optional * fix catalog * remove prep 2018-06-27 22:51:04 -07:00			`def get_activation_fn(name):`
			`return getattr(tf.nn, name)`


[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`def conv2d(x,`
			`num_filters,`
			`name,`
			`filter_size=(3, 3),`
			`stride=(1, 1),`
			`pad="SAME",`
[rllib] Remove dependency on TensorFlow (#4764) * remove hard tf dep * add test * comment fix * fix test 2019-05-10 20:36:18 -07:00			`dtype=None,`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`collections=None):`
[rllib] Remove dependency on TensorFlow (#4764) * remove hard tf dep * add test * comment fix * fix test 2019-05-10 20:36:18 -07:00			`if dtype is None:`
			`dtype = tf.float32`

[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`with tf.variable_scope(name):`
			`stride_shape = [1, stride[0], stride[1], 1]`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`filter_shape = [`
			`filter_size[0], filter_size[1],`
			`int(x.get_shape()[3]), num_filters`
			`]`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00
			`# There are "num input feature maps * filter height * filter width"`
			`# inputs to each hidden unit.`
			`fan_in = np.prod(filter_shape[:3])`
			`# Each unit in the lower layer receives a gradient from: "num output`
			`# feature maps * filter height * filter width" / pooling size.`
			`fan_out = np.prod(filter_shape[:2]) * num_filters`
			`# Initialize weights with random weights.`
			`w_bound = np.sqrt(6 / (fan_in + fan_out))`

[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`w = tf.get_variable(`
			`"W",`
			`filter_shape,`
			`dtype,`
			`tf.random_uniform_initializer(-w_bound, w_bound),`
			`collections=collections)`
			`b = tf.get_variable(`
			`"b", [1, 1, 1, num_filters],`
			`initializer=tf.constant_initializer(0.0),`
			`collections=collections)`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`return tf.nn.conv2d(x, w, stride_shape, pad) + b`


			`def linear(x, size, name, initializer=None, bias_init=0):`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`w = tf.get_variable(`
			`name + "/w", [x.get_shape()[1], size], initializer=initializer)`
			`b = tf.get_variable(`
			`name + "/b", [size], initializer=tf.constant_initializer(bias_init))`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`return tf.matmul(x, w) + b`


			`def flatten(x):`
			`return tf.reshape(x, [-1, np.prod(x.get_shape().as_list()[1:])])`