mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
[Serve] Add Instructions for GPU (#8495)
This commit is contained in:
parent
1163ddbe45
commit
c9c84c87f4
3 changed files with 27 additions and 3 deletions
|
@ -64,6 +64,7 @@ Any method of the actor can return multiple object IDs with the ``ray.method`` d
|
|||
assert ray.get(obj_id1) == 1
|
||||
assert ray.get(obj_id2) == 2
|
||||
|
||||
.. _actor-resource-guide:
|
||||
|
||||
Resources with Actors
|
||||
---------------------
|
||||
|
|
|
@ -50,6 +50,8 @@ If using the command line, connect to the Ray cluster as follow:
|
|||
# Connect to ray. Notice if connected to existing cluster, you don't specify resources.
|
||||
ray.init(address=<address>)
|
||||
|
||||
.. _omp-num-thread-note:
|
||||
|
||||
.. note::
|
||||
Ray sets the environment variable ``OMP_NUM_THREADS=1`` by default. This is done
|
||||
to avoid performance degradation with many workers (issue #6998). You can
|
||||
|
|
|
@ -192,6 +192,27 @@ To scale out a backend to multiple workers, simplify configure the number of rep
|
|||
|
||||
This will scale out the number of workers that can accept requests.
|
||||
|
||||
Using Resources (CPUs, GPUs)
|
||||
++++++++++++++++++++++++++++
|
||||
To assign hardware resource per worker, you can pass resource requirements to
|
||||
``ray_actor_options``. To learn about options to pass in, take a look at
|
||||
:ref:`Resources with Actor<actor-resource-guide>` guide.
|
||||
|
||||
For example, to create a backend where each replica uses a single GPU, you can do the
|
||||
following:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
options = {"num_gpus": 1}
|
||||
serve.create_backend("my_gpu_backend", handle_request, ray_actor_options=options)
|
||||
|
||||
.. note::
|
||||
|
||||
Deep learning models like PyTorch and Tensorflow often use all the CPUs when
|
||||
performing inference. Ray sets the environment variable ``OMP_NUM_THREADS=1`` to
|
||||
:ref:`avoid contention<omp-num-thread-note>`. This means each worker will only
|
||||
use one CPU instead of all of them.
|
||||
|
||||
Splitting Traffic
|
||||
+++++++++++++++++
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue