ray/rllib/models/tf/misc.py

import numpy as np
from typing import Tuple, Any, Optional

from ray.rllib.utils.annotations import DeveloperAPI
from ray.rllib.utils.framework import try_import_tf
from ray.rllib.utils.typing import TensorType

tf1, tf, tfv = try_import_tf()


@DeveloperAPI
def normc_initializer(std: float = 1.0) -> Any:
    def _initializer(shape, dtype=None, partition_info=None):
        out = np.random.randn(*shape).astype(
            dtype.name if hasattr(dtype, "name") else dtype or np.float32
        )
        out *= std / np.sqrt(np.square(out).sum(axis=0, keepdims=True))
        return tf.constant(out)

    return _initializer


@DeveloperAPI
def conv2d(
    x: TensorType,
    num_filters: int,
    name: str,
    filter_size: Tuple[int, int] = (3, 3),
    stride: Tuple[int, int] = (1, 1),
    pad: str = "SAME",
    dtype: Optional[Any] = None,
    collections: Optional[Any] = None,
) -> TensorType:
    if dtype is None:
        dtype = tf.float32

    with tf1.variable_scope(name):
        stride_shape = [1, stride[0], stride[1], 1]
        filter_shape = [
            filter_size[0],
            filter_size[1],
            int(x.get_shape()[3]),
            num_filters,
        ]

        # There are "num input feature maps * filter height * filter width"
        # inputs to each hidden unit.
        fan_in = np.prod(filter_shape[:3])
        # Each unit in the lower layer receives a gradient from: "num output
        # feature maps * filter height * filter width" / pooling size.
        fan_out = np.prod(filter_shape[:2]) * num_filters
        # Initialize weights with random weights.
        w_bound = np.sqrt(6 / (fan_in + fan_out))

        w = tf1.get_variable(
            "W",
            filter_shape,
            dtype,
            tf1.random_uniform_initializer(-w_bound, w_bound),
            collections=collections,
        )
        b = tf1.get_variable(
            "b",
            [1, 1, 1, num_filters],
            initializer=tf1.constant_initializer(0.0),
            collections=collections,
        )
        return tf1.nn.conv2d(x, w, stride_shape, pad) + b


@DeveloperAPI
def linear(
    x: TensorType,
    size: int,
    name: str,
    initializer: Optional[Any] = None,
    bias_init: float = 0.0,
) -> TensorType:
    w = tf1.get_variable(name + "/w", [x.get_shape()[1], size], initializer=initializer)
    b = tf1.get_variable(
        name + "/b", [size], initializer=tf1.constant_initializer(bias_init)
    )
    return tf.matmul(x, w) + b


@DeveloperAPI
def flatten(x: TensorType) -> TensorType:
    return tf.reshape(x, [-1, np.prod(x.get_shape().as_list()[1:])])
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`import numpy as np`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`from typing import Tuple, Any, Optional`

[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`from ray.rllib.utils.annotations import DeveloperAPI`
[RLlib] Minor `rllib.utils` cleanup. (#8932) 2020-06-16 08:52:20 +02:00			`from ray.rllib.utils.framework import try_import_tf`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`from ray.rllib.utils.typing import TensorType`
[rllib] Remove dependency on TensorFlow (#4764) * remove hard tf dep * add test * comment fix * fix test 2019-05-10 20:36:18 -07:00
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`tf1, tf, tfv = try_import_tf()`
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00

[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`@DeveloperAPI`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`def normc_initializer(std: float = 1.0) -> Any:`
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`def _initializer(shape, dtype=None, partition_info=None):`
[RLlib] Issue 23689: tf Initializer has hard-coded float32 dtype. (#23741) 2022-04-07 21:35:02 +02:00			`out = np.random.randn(*shape).astype(`
			`dtype.name if hasattr(dtype, "name") else dtype or np.float32`
			`)`
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`out *= std / np.sqrt(np.square(out).sum(axis=0, keepdims=True))`
			`return tf.constant(out)`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00
[rllib] Code for Supporting Shared Models (#775) * Code for Supporting Shared Models * Running (with vnet modification) - needs to be tested for performance * Small fix for jenkins * Linting * linting * Summaries * Small refactoring + generalized to more domains * Addressing changes * Addressing changes * Update envs.py * Addressing changes * convnet * final touches * Merge - new model * final linting * Changing iterations back * Policy option removed, fixed small things * Nits * nit * Linting * Linting 2017-08-03 19:29:01 -07:00			`return _initializer`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00

[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`@DeveloperAPI`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`def conv2d(`
			`x: TensorType,`
			`num_filters: int,`
			`name: str,`
			`filter_size: Tuple[int, int] = (3, 3),`
			`stride: Tuple[int, int] = (1, 1),`
			`pad: str = "SAME",`
			`dtype: Optional[Any] = None,`
			`collections: Optional[Any] = None,`
			`) -> TensorType:`
[rllib] Remove dependency on TensorFlow (#4764) * remove hard tf dep * add test * comment fix * fix test 2019-05-10 20:36:18 -07:00			`if dtype is None:`
			`dtype = tf.float32`

[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`with tf1.variable_scope(name):`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`stride_shape = [1, stride[0], stride[1], 1]`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`filter_shape = [`
			`filter_size[0],`
			`filter_size[1],`
			`int(x.get_shape()[3]),`
			`num_filters,`
			`]`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00
			`# There are "num input feature maps * filter height * filter width"`
			`# inputs to each hidden unit.`
			`fan_in = np.prod(filter_shape[:3])`
			`# Each unit in the lower layer receives a gradient from: "num output`
			`# feature maps * filter height * filter width" / pooling size.`
			`fan_out = np.prod(filter_shape[:2]) * num_filters`
			`# Initialize weights with random weights.`
			`w_bound = np.sqrt(6 / (fan_in + fan_out))`

[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`w = tf1.get_variable(`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`"W",`
			`filter_shape,`
			`dtype,`
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`tf1.random_uniform_initializer(-w_bound, w_bound),`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`collections=collections,`
			`)`
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`b = tf1.get_variable(`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`"b",`
			`[1, 1, 1, num_filters],`
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`initializer=tf1.constant_initializer(0.0),`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`collections=collections,`
			`)`
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`return tf1.nn.conv2d(x, w, stride_shape, pad) + b`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00

[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`@DeveloperAPI`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`def linear(`
			`x: TensorType,`
			`size: int,`
			`name: str,`
			`initializer: Optional[Any] = None,`
			`bias_init: float = 0.0,`
			`) -> TensorType:`
[rllib] format with yapf (#2427) * initial yapf * manual fix yapf bugs 2018-07-19 15:30:36 -07:00			`w = tf1.get_variable(name + "/w", [x.get_shape()[1], size], initializer=initializer)`
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`b = tf1.get_variable(`
			`name + "/b", [size], initializer=tf1.constant_initializer(bias_init)`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`)`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`return tf.matmul(x, w) + b`


[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00			`@DeveloperAPI`
[RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 03:18:50 -08:00			`def flatten(x: TensorType) -> TensorType:`
[rllib] Additional support for Shared Models in A3C (#866) * Code for Supporting Shared Models Running (with vnet modification) - needs to be tested for performance Summaries Small refactoring + generalized to more domains Small fix for jenkins Linting linting Addressing changes Addressing changes Update envs.py Addressing changes convnet Merge - new model final touches final linting Changing iterations back removed extra change changes for fast experimentation changes to enable a2c TEMP FOR DEBUGGING ContinuousActions - Still doesn't work InvertedPendulum trains with 8 workers - k=200 huber loss Maxes for InvertedPendulum-v1 - 16w,200steps temp: working with a2c Back to shared model more fixes small nit LSTM to shared models need to fix last_features tuning pong Best record for hitting 0 - with k=16,n=20 nit a2cremoval remove A2c reference and nits nit removed a2c vestiges removing a2c removing example.py Linting nit * Linting + Removing vestigal code * Final Touches * nits * rerun travis 2017-08-28 12:23:14 -07:00			`return tf.reshape(x, [-1, np.prod(x.get_shape().as_list()[1:])])`