mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
[Serve] Add Instructions for GPU (#8495)
This commit is contained in:
parent
1163ddbe45
commit
c9c84c87f4
3 changed files with 27 additions and 3 deletions
|
@ -64,6 +64,7 @@ Any method of the actor can return multiple object IDs with the ``ray.method`` d
|
||||||
assert ray.get(obj_id1) == 1
|
assert ray.get(obj_id1) == 1
|
||||||
assert ray.get(obj_id2) == 2
|
assert ray.get(obj_id2) == 2
|
||||||
|
|
||||||
|
.. _actor-resource-guide:
|
||||||
|
|
||||||
Resources with Actors
|
Resources with Actors
|
||||||
---------------------
|
---------------------
|
||||||
|
|
|
@ -50,14 +50,16 @@ If using the command line, connect to the Ray cluster as follow:
|
||||||
# Connect to ray. Notice if connected to existing cluster, you don't specify resources.
|
# Connect to ray. Notice if connected to existing cluster, you don't specify resources.
|
||||||
ray.init(address=<address>)
|
ray.init(address=<address>)
|
||||||
|
|
||||||
|
.. _omp-num-thread-note:
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
Ray sets the environment variable ``OMP_NUM_THREADS=1`` by default. This is done
|
Ray sets the environment variable ``OMP_NUM_THREADS=1`` by default. This is done
|
||||||
to avoid performance degradation with many workers (issue #6998). You can
|
to avoid performance degradation with many workers (issue #6998). You can
|
||||||
override this by explicitly setting ``OMP_NUM_THREADS``. ``OMP_NUM_THREADS`` is commonly
|
override this by explicitly setting ``OMP_NUM_THREADS``. ``OMP_NUM_THREADS`` is commonly
|
||||||
used in numpy, PyTorch, and Tensorflow to perform multit-threaded linear algebra.
|
used in numpy, PyTorch, and Tensorflow to perform multit-threaded linear algebra.
|
||||||
In multi-worker setting, we want one thread per worker instead of many threads
|
In multi-worker setting, we want one thread per worker instead of many threads
|
||||||
per worker to avoid contention.
|
per worker to avoid contention.
|
||||||
|
|
||||||
|
|
||||||
Logging and Debugging
|
Logging and Debugging
|
||||||
---------------------
|
---------------------
|
||||||
|
|
|
@ -192,6 +192,27 @@ To scale out a backend to multiple workers, simplify configure the number of rep
|
||||||
|
|
||||||
This will scale out the number of workers that can accept requests.
|
This will scale out the number of workers that can accept requests.
|
||||||
|
|
||||||
|
Using Resources (CPUs, GPUs)
|
||||||
|
++++++++++++++++++++++++++++
|
||||||
|
To assign hardware resource per worker, you can pass resource requirements to
|
||||||
|
``ray_actor_options``. To learn about options to pass in, take a look at
|
||||||
|
:ref:`Resources with Actor<actor-resource-guide>` guide.
|
||||||
|
|
||||||
|
For example, to create a backend where each replica uses a single GPU, you can do the
|
||||||
|
following:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
options = {"num_gpus": 1}
|
||||||
|
serve.create_backend("my_gpu_backend", handle_request, ray_actor_options=options)
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Deep learning models like PyTorch and Tensorflow often use all the CPUs when
|
||||||
|
performing inference. Ray sets the environment variable ``OMP_NUM_THREADS=1`` to
|
||||||
|
:ref:`avoid contention<omp-num-thread-note>`. This means each worker will only
|
||||||
|
use one CPU instead of all of them.
|
||||||
|
|
||||||
Splitting Traffic
|
Splitting Traffic
|
||||||
+++++++++++++++++
|
+++++++++++++++++
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue