ray/doc/source/cluster/autoscaling.rst

.. _ref-autoscaling:

Cluster Autoscaling
===================

.. tip:: Before you continue, be sure to have read :ref:`cluster-cloud`.

Basics
------

The Ray Cluster Launcher will automatically enable a load-based autoscaler. When cluster resource usage exceeds a configurable threshold (80% by default), new nodes will be launched up to the specified ``max_workers`` limit (specified in the cluster config). When nodes are idle for more than a timeout, they will be removed, down to the ``min_workers`` limit. The head node is never removed.

In more detail, the autoscaler implements the following control loop:

 1. It calculates the estimated utilization of the cluster based on the most-currently-assigned resource. For example, suppose a cluster has 100/200 CPUs assigned, but 20/25 GPUs assigned, then the utilization will be considered to be max(100/200, 15/25) = 60%.
 2. If the estimated utilization is greater than the target (80% by default), then the autoscaler will attempt to add nodes to the cluster.
 3. If a node is idle for a timeout (5 minutes by default), it is removed from the cluster.

The basic autoscaling config settings are as follows:

.. code-block:: yaml

    # An unique identifier for the head node and workers of this cluster.
    cluster_name: default

    # The minimum number of workers nodes to launch in addition to the head
    # node. This number should be >= 0.
    min_workers: 0

    # The autoscaler will scale up the cluster to this target fraction of resource
    # usage. For example, if a cluster of 10 nodes is 100% busy and
    # target_utilization is 0.8, it would resize the cluster to 13. This fraction
    # can be decreased to increase the aggressiveness of upscaling.
    # The max value allowed is 1.0, which is the most conservative setting.
    target_utilization_fraction: 0.8

    # If a node is idle for this many minutes, it will be removed. A node is
    # considered idle if there are no tasks or actors running on it.
    idle_timeout_minutes: 5


Manually Adding Nodes without Resources (Unmanaged Nodes)
---------------------------------------------------------

In some cases, adding special nodes without any resources (i.e. `num_cpus=0`) may be desirable. Such nodes can be used as a driver which connects to the cluster to launch jobs.

In order to manually add a node to an autoscaled cluster, the `ray-cluster-name` tag should be set and `ray-node-type` tag should be set to `unmanaged`.

Unmanaged nodes **must have 0 resources**.

If you are using the `available_node_types` field, you should create a custom node type with `resources: {}`, and `max_workers: 0` when configuring the autoscaler.

The autoscaler will not attempt to start, stop, or update unmanaged nodes. The user is responsible for properly setting up and cleaning up unmanaged nodes.


Multiple Node Type Autoscaling
------------------------------

Ray supports multiple node types in a single cluster. In this mode of operation, the scheduler will look at the queue of resource shape demands from the cluster (e.g., there might be 10 tasks queued each requesting ``{"GPU": 4, "CPU": 16}``), and tries to add the minimum set of nodes that can fulfill these resource demands. This enables precise, rapid scale up compared to looking only at resource utilization, as the autoscaler also has visibility into the queue of resource demands.

The concept of a cluster node type encompasses both the physical instance type (e.g., AWS p3.8xl GPU nodes vs m4.16xl CPU nodes), as well as other attributes (e.g., IAM role, the machine image, etc). `Custom resources <configure.html>`__ can be specified for each node type so that Ray is aware of the demand for specific node types at the application level (e.g., a task may request to be placed on a machine with a specific role or machine image via custom resource).

Multi-node type autoscaling operates in conjunction with the basic autoscaler. You may want to configure the basic autoscaler accordingly to act conservatively (i.e., set ``target_utilization_fraction: 1.0``).

An example of configuring multiple node types is as follows `(full example) <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-multi-node-type.yaml>`__:

.. code-block:: yaml

    # Specify the allowed node types and the resources they provide.
    # The key is the name of the node type, which is just for debugging purposes.
    # The node config specifies the launch config and physical instance type.
    available_node_types:
        cpu_4_ondemand:
            node_config:
                InstanceType: m4.xlarge
            # For AWS instances, autoscaler will automatically add the available
            # CPUs/GPUs/accelerator_type ({"CPU": 4} for m4.xlarge) in "resources".
            # resources: {"CPU": 4}
            min_workers: 1
            max_workers: 5
        cpu_16_spot:
            node_config:
                InstanceType: m4.4xlarge
                InstanceMarketOptions:
                    MarketType: spot
            # Autoscaler will auto fill the CPU resources below.
            resources: {"Custom1": 1, "is_spot": 1}
            max_workers: 10
        gpu_1_ondemand:
            node_config:
                InstanceType: p2.xlarge
            # Autoscaler will auto fill the CPU/GPU resources below.
            resources: {"Custom2": 2}
            max_workers: 4
            worker_setup_commands:
                - pip install tensorflow-gpu  # Example command.
        gpu_8_ondemand:
            node_config:
                InstanceType: p3.8xlarge
            # Autoscaler autofills the "resources" below.
            # resources: {"CPU": 32, "GPU": 4, "accelerator_type:V100": 1}
            max_workers: 2
            worker_setup_commands:
                - pip install tensorflow-gpu  # Example command.

    # Specify the node type of the head node (as configured above).
    head_node_type: cpu_4_ondemand

    # Specify the default type of the worker node (as configured above).
    worker_default_node_type: cpu_16_spot


The above config defines two CPU node types (``cpu_4_ondemand`` and ``cpu_16_spot``), and two GPU types (``gpu_1_ondemand`` and ``gpu_8_ondemand``). Each node type has a name (e.g., ``cpu_4_ondemand``), which has no semantic meaning and is only for debugging. Let's look at the inner fields of the ``gpu_1_ondemand`` node type:

The node config tells the underlying Cloud provider how to launch a node of this type. This node config is merged with the top level node config of the YAML and can override fields (i.e., to specify the p2.xlarge instance type here):

.. code-block:: yaml

    node_config:
        InstanceType: p2.xlarge

The resources field tells the autoscaler what kinds of resources this node provides. This can include custom resources as well (e.g., "Custom2"). This field enables the autoscaler to automatically select the right kind of nodes to launch given the resource demands of the application. The resources specified here will be automatically passed to the ``ray start`` command for the node via an environment variable. For more information, see also the `resource demand scheduler <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/resource_demand_scheduler.py>`__:

.. code-block:: yaml

    resources: {"CPU": 4, "GPU": 1, "Custom2": 2}

The ``min_workers`` and ``max_workers`` fields constrain the minimum and maximum number of nodes of this type to launch, respectively:

.. code-block:: yaml

    min_workers: 1
    max_workers: 4

The ``worker_setup_commands`` field (and also the ``initialization_commands`` field, not shown) can be used to override the setup and initialization commands for a node type. Note that you can only override the setup for worker nodes. The head node's setup commands are always configured via the top level field in the cluster YAML:

.. code-block:: yaml

    worker_setup_commands:
        - pip install tensorflow-gpu  # Example command.

Docker Support for Multi-type clusters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For each node type, you can specify ``worker_image`` and ``pull_before_run`` fields. These will override any top level ``docker`` section values (see :ref:`autoscaler-docker`). The ``worker_run_options`` field is combined with top level ``docker: run_options`` field to produce the docker run command for the given node_type.  Ray will automatically select the Nvidia docker runtime if it is available.

The following configuration is for a GPU enabled node type:

.. code-block:: yaml

    available_node_types:
        gpu_1_ondemand:
            max_workers: 2
            worker_setup_commands:
                - pip install tensorflow-gpu  # Example command.

            # Docker specific commands for gpu_1_ondemand
            pull_before_run: True
            worker_image:
                - rayproject/ray-ml:latest-gpu
            worker_run_options:  # Appended to top-level docker field.
                - "-v /home:/home"
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`.. _ref-autoscaling:`

			`Cluster Autoscaling`
			`===================`

[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			.. tip:: Before you continue, be sure to have read :ref:`cluster-cloud`.

[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`Basics`
			`------`

			The Ray Cluster Launcher will automatically enable a load-based autoscaler. When cluster resource usage exceeds a configurable threshold (80% by default), new nodes will be launched up to the specified ``max_workers`` limit (specified in the cluster config). When nodes are idle for more than a timeout, they will be removed, down to the ``min_workers`` limit. The head node is never removed.

			`In more detail, the autoscaler implements the following control loop:`

			`1. It calculates the estimated utilization of the cluster based on the most-currently-assigned resource. For example, suppose a cluster has 100/200 CPUs assigned, but 20/25 GPUs assigned, then the utilization will be considered to be max(100/200, 15/25) = 60%.`
			`2. If the estimated utilization is greater than the target (80% by default), then the autoscaler will attempt to add nodes to the cluster.`
			`3. If a node is idle for a timeout (5 minutes by default), it is removed from the cluster.`

			`The basic autoscaling config settings are as follows:`

[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			`# An unique identifier for the head node and workers of this cluster.`
			`cluster_name: default`

			`# The minimum number of workers nodes to launch in addition to the head`
			`# node. This number should be >= 0.`
			`min_workers: 0`

			`# The autoscaler will scale up the cluster to this target fraction of resource`
			`# usage. For example, if a cluster of 10 nodes is 100% busy and`
			`# target_utilization is 0.8, it would resize the cluster to 13. This fraction`
			`# can be decreased to increase the aggressiveness of upscaling.`
			`# The max value allowed is 1.0, which is the most conservative setting.`
			`target_utilization_fraction: 0.8`

			`# If a node is idle for this many minutes, it will be removed. A node is`
			`# considered idle if there are no tasks or actors running on it.`
			`idle_timeout_minutes: 5`

[Autoscaler] Unmanaged nodes (#10513) 2020-09-13 11:58:47 -07:00
			`Manually Adding Nodes without Resources (Unmanaged Nodes)`
			`---------------------------------------------------------`

			In some cases, adding special nodes without any resources (i.e. `num_cpus=0`) may be desirable. Such nodes can be used as a driver which connects to the cluster to launch jobs.

			In order to manually add a node to an autoscaled cluster, the `ray-cluster-name` tag should be set and `ray-node-type` tag should be set to `unmanaged`.

			`Unmanaged nodes must have 0 resources.`

			If you are using the `available_node_types` field, you should create a custom node type with `resources: {}`, and `max_workers: 0` when configuring the autoscaler.

Automatically detect CPU, GPU, accelerator_type for AWS (#11147) 2020-10-02 21:16:43 -07:00			`The autoscaler will not attempt to start, stop, or update unmanaged nodes. The user is responsible for properly setting up and cleaning up unmanaged nodes.`
[Autoscaler] Unmanaged nodes (#10513) 2020-09-13 11:58:47 -07:00

[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`Multiple Node Type Autoscaling`
			`------------------------------`

[1.0] Ray whitepaper link and tagline update (#10455) 2020-09-01 09:48:35 -07:00			Ray supports multiple node types in a single cluster. In this mode of operation, the scheduler will look at the queue of resource shape demands from the cluster (e.g., there might be 10 tasks queued each requesting ``{"GPU": 4, "CPU": 16}``), and tries to add the minimum set of nodes that can fulfill these resource demands. This enables precise, rapid scale up compared to looking only at resource utilization, as the autoscaler also has visibility into the queue of resource demands.
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			The concept of a cluster node type encompasses both the physical instance type (e.g., AWS p3.8xl GPU nodes vs m4.16xl CPU nodes), as well as other attributes (e.g., IAM role, the machine image, etc). `Custom resources <configure.html>`__ can be specified for each node type so that Ray is aware of the demand for specific node types at the application level (e.g., a task may request to be placed on a machine with a specific role or machine image via custom resource).

			Multi-node type autoscaling operates in conjunction with the basic autoscaler. You may want to configure the basic autoscaler accordingly to act conservatively (i.e., set ``target_utilization_fraction: 1.0``).

			An example of configuring multiple node types is as follows `(full example) <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-multi-node-type.yaml>`__:

[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			`# Specify the allowed node types and the resources they provide.`
			`# The key is the name of the node type, which is just for debugging purposes.`
			`# The node config specifies the launch config and physical instance type.`
			`available_node_types:`
			`cpu_4_ondemand:`
			`node_config:`
			`InstanceType: m4.xlarge`
Automatically detect CPU, GPU, accelerator_type for AWS (#11147) 2020-10-02 21:16:43 -07:00			`# For AWS instances, autoscaler will automatically add the available`
			`# CPUs/GPUs/accelerator_type ({"CPU": 4} for m4.xlarge) in "resources".`
			`# resources: {"CPU": 4}`
[autoscaler] Support min_workers for multi node type (#11041) * prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5. * reducing diff Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> 2020-09-28 22:02:01 -07:00			`min_workers: 1`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`max_workers: 5`
			`cpu_16_spot:`
			`node_config:`
			`InstanceType: m4.4xlarge`
			`InstanceMarketOptions:`
			`MarketType: spot`
Automatically detect CPU, GPU, accelerator_type for AWS (#11147) 2020-10-02 21:16:43 -07:00			`# Autoscaler will auto fill the CPU resources below.`
			`resources: {"Custom1": 1, "is_spot": 1}`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`max_workers: 10`
			`gpu_1_ondemand:`
			`node_config:`
			`InstanceType: p2.xlarge`
Automatically detect CPU, GPU, accelerator_type for AWS (#11147) 2020-10-02 21:16:43 -07:00			`# Autoscaler will auto fill the CPU/GPU resources below.`
			`resources: {"Custom2": 2}`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`max_workers: 4`
			`worker_setup_commands:`
			`- pip install tensorflow-gpu # Example command.`
			`gpu_8_ondemand:`
			`node_config:`
Automatically detect CPU, GPU, accelerator_type for AWS (#11147) 2020-10-02 21:16:43 -07:00			`InstanceType: p3.8xlarge`
			`# Autoscaler autofills the "resources" below.`
			`# resources: {"CPU": 32, "GPU": 4, "accelerator_type:V100": 1}`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`max_workers: 2`
			`worker_setup_commands:`
			`- pip install tensorflow-gpu # Example command.`

			`# Specify the node type of the head node (as configured above).`
			`head_node_type: cpu_4_ondemand`

			`# Specify the default type of the worker node (as configured above).`
			`worker_default_node_type: cpu_16_spot`


			The above config defines two CPU node types (``cpu_4_ondemand`` and ``cpu_16_spot``), and two GPU types (``gpu_1_ondemand`` and ``gpu_8_ondemand``). Each node type has a name (e.g., ``cpu_4_ondemand``), which has no semantic meaning and is only for debugging. Let's look at the inner fields of the ``gpu_1_ondemand`` node type:

			`The node config tells the underlying Cloud provider how to launch a node of this type. This node config is merged with the top level node config of the YAML and can override fields (i.e., to specify the p2.xlarge instance type here):`

[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			`node_config:`
			`InstanceType: p2.xlarge`

			The resources field tells the autoscaler what kinds of resources this node provides. This can include custom resources as well (e.g., "Custom2"). This field enables the autoscaler to automatically select the right kind of nodes to launch given the resource demands of the application. The resources specified here will be automatically passed to the ``ray start`` command for the node via an environment variable. For more information, see also the `resource demand scheduler <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/resource_demand_scheduler.py>`__:

[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			`resources: {"CPU": 4, "GPU": 1, "Custom2": 2}`

[autoscaler] Support min_workers for multi node type (#11041) * prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5. * reducing diff Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> 2020-09-28 22:02:01 -07:00			The ``min_workers`` and ``max_workers`` fields constrain the minimum and maximum number of nodes of this type to launch, respectively:
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
[autoscaler] Support min_workers for multi node type (#11041) * prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5. * reducing diff Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> 2020-09-28 22:02:01 -07:00			`min_workers: 1`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00			`max_workers: 4`

[1.0] Ray whitepaper link and tagline update (#10455) 2020-09-01 09:48:35 -07:00			The ``worker_setup_commands`` field (and also the ``initialization_commands`` field, not shown) can be used to override the setup and initialization commands for a node type. Note that you can only override the setup for worker nodes. The head node's setup commands are always configured via the top level field in the cluster YAML:
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`
[autoscaler] Add documentation for multi node type autoscaling (#10405) 2020-08-28 19:57:21 -07:00
			`worker_setup_commands:`
			`- pip install tensorflow-gpu # Example command.`
[docker] Support multiple node types (#10504) 2020-09-02 18:27:59 -07:00
[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`Docker Support for Multi-type clusters`
			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

			For each node type, you can specify ``worker_image`` and ``pull_before_run`` fields. These will override any top level ``docker`` section values (see :ref:`autoscaler-docker`). The ``worker_run_options`` field is combined with top level ``docker: run_options`` field to produce the docker run command for the given node_type. Ray will automatically select the Nvidia docker runtime if it is available.

			`The following configuration is for a GPU enabled node type:`
[docker] Support multiple node types (#10504) 2020-09-02 18:27:59 -07:00
[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`.. code-block:: yaml`

			`available_node_types:`
			`gpu_1_ondemand:`
			`max_workers: 2`
			`worker_setup_commands:`
			`- pip install tensorflow-gpu # Example command.`
[docker] Support multiple node types (#10504) 2020-09-02 18:27:59 -07:00
[docs] update cloud docs (#11262) * update-cloud-docs Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/cluster/config.rst Co-authored-by: Ian Rodney <ian.rodney@gmail.com> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> * fix Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Ian Rodney <ian.rodney@gmail.com> 2020-10-21 16:37:26 -07:00			`# Docker specific commands for gpu_1_ondemand`
			`pull_before_run: True`
			`worker_image:`
			`- rayproject/ray-ml:latest-gpu`
			`worker_run_options: # Appended to top-level docker field.`
			`- "-v /home:/home"`