[Serve] [Docs] Replace references to dag.execute() with handle.predict.remote() (#27784)

This commit is contained in:
shrekris-anyscale 2022-08-12 17:09:28 -07:00 committed by GitHub
parent 8cb09a9fc5
commit 0a3c1de08b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
8 changed files with 121 additions and 67 deletions

View file

@ -3,6 +3,8 @@
import ray
from ray import serve
from ray.serve.drivers import DAGDriver
from ray.serve.http_adapters import json_request
from ray.serve.deployment_graph import InputNode
@ -28,7 +30,11 @@ with InputNode() as user_input:
output2 = model2.forward.bind(user_input)
combine_output = combine.bind([output1, output2])
sum = ray.get(combine_output.execute(1))
graph = DAGDriver.bind(combine_output, http_adapter=json_request)
handle = serve.run(graph)
sum = ray.get(handle.predict.remote(1))
print(sum)
# __graph_end__

View file

@ -3,7 +3,9 @@
import ray
from ray import serve
from ray.dag.input_node import InputNode
from ray.serve.drivers import DAGDriver
from ray.serve.http_adapters import json_request
from ray.serve.deployment_graph import InputNode
@serve.deployment
@ -32,10 +34,14 @@ with InputNode() as user_input:
output2 = model2.forward.bind(input_number)
combine_output = combine.bind(output1, output2, input_operation)
max_output = ray.get(combine_output.execute(1, "max"))
graph = DAGDriver.bind(combine_output, http_adapter=json_request)
handle = serve.run(graph)
max_output = ray.get(handle.predict.remote(1, "max"))
print(max_output)
sum_output = ray.get(combine_output.execute(1, "sum"))
sum_output = ray.get(handle.predict.remote(1, "sum"))
print(sum_output)
# __graph_end__

View file

@ -3,6 +3,8 @@
import ray
from ray import serve
from ray.serve.drivers import DAGDriver
from ray.serve.http_adapters import json_request
from ray.serve.deployment_graph import InputNode
@ -24,9 +26,11 @@ with InputNode() as graph_input:
for i in range(1, len(nodes)):
outputs[i] = nodes[i].forward.bind(outputs[i - 1])
last_output_node = outputs[-1]
graph = DAGDriver.bind(outputs[-1], http_adapter=json_request)
sum = ray.get(last_output_node.execute(0))
handle = serve.run(graph)
sum = ray.get(handle.predict.remote(1))
print(sum)
# __graph_end__

View file

@ -55,3 +55,45 @@ print(output)
# __graph_client_end__
assert output == 9
# __adapter_graph_start__
# This import can go to the top of the file.
from ray.serve.http_adapters import json_request
add_2 = AddCls.bind(2)
add_3 = AddCls.bind(3)
with InputNode() as request_number:
add_2_output = add_2.add.bind(request_number)
subtract_1_output = subtract_one_fn.bind(add_2_output)
add_3_output = add_3.add.bind(subtract_1_output)
graph = DAGDriver.bind(add_3_output, http_adapter=json_request)
# __adapter_graph_end__
serve.run(graph)
assert requests.post("http://localhost:8000/", json=5).json() == 9
# __test_graph_start__
# These imports can go to the top of the file.
import ray
from ray.serve.http_adapters import json_request
add_2 = AddCls.bind(2)
add_3 = AddCls.bind(3)
with InputNode() as request_number:
add_2_output = add_2.add.bind(request_number)
subtract_1_output = subtract_one_fn.bind(add_2_output)
add_3_output = add_3.add.bind(subtract_1_output)
graph = DAGDriver.bind(add_3_output, http_adapter=json_request)
handle = serve.run(graph)
ref = handle.predict.remote(5)
result = ray.get(ref)
print(result)
# __test_graph_end__
assert result == 9

View file

@ -37,7 +37,7 @@ This call has a few parts:
* `remote` indicates that this is a `ServeHandle` call to another deployment. This is required when invoking a deployment's method through another deployment. It needs to be added to the method name.
* `name` is the argument for `say_hello`. You can pass any number of arguments or keyword arguments here.
This call returns a reference to the result not the result itself. This pattern allows the call to execute asynchronously. To get the actual result, `await` the reference. `await` blocks until the asynchronous call executes, and then it returns the result. In this example, line 23 calls `await ref` and returns the resulting string. **Note that we need two `await` statements in total**. First, we `await` the `ServeHandle` call itself to retrieve a reference. Then we `await` the reference to get the final result.
This call returns a reference to the result not the result itself. This pattern allows the call to execute asynchronously. To get the actual result, `await` the reference. `await` blocks until the asynchronous call executes, and then it returns the result. In this example, line 23 calls `await ref` and returns the resulting string. **Note that getting the result needs two `await` statements in total**. First, the script must `await` the `ServeHandle` call itself to retrieve a reference. Then it must `await` the reference to get the final result.
(serve-model-composition-await-warning)=
:::{warning}
@ -72,9 +72,9 @@ Composition lets you break apart your application and independently scale each p
With composition, you can avoid application-level bottlenecks when serving models and business logic steps that use different types and amounts of resources.
:::
```{note}
For a deep dive in to the architecture of ServeHandle and its usage, take a look at [this user guide](serve-handle-explainer).
```
:::{note}
For a deep dive into the architecture of `ServeHandle` and its usage, take a look at [this user guide](serve-handle-explainer).
:::
(serve-model-composition-deployment-graph)=
## Deployment Graph API
@ -156,7 +156,7 @@ Here's the graph:
:linenos: true
```
In lines 29 and 30, we bind two `ClassNodes` from the `AddCls` deployment. In line 32, we start our call graph:
Lines 29 and 30 bind two `ClassNodes` from the `AddCls` deployment. Line 32 starts the call graph:
```python
with InputNode() as http_request:
@ -166,13 +166,14 @@ with InputNode() as http_request:
add_3_output = add_3.add.bind(subtract_1_output)
```
The `with` statement (known as a "context manager" in Python) initializes a special Ray Serve-provided object called an `InputNode`. This isn't a `DeploymentNode` like `ClassNodes`, `MethodNodes`, or `FunctionNodes`. Rather, it represents the input of our graph. In this case, that input represents an HTTP request. In [a future section](deployment-graph-drivers-http-adapters), we'll show how you can change this input type using another Ray Serve-provided object called the driver.
The `with` statement (known as a "context manager" in Python) initializes a special Ray Serve-provided object called an `InputNode`. This isn't a `DeploymentNode` like `ClassNodes`, `MethodNodes`, or `FunctionNodes`. Rather, it's the input of the graph. In this case, that input is an HTTP request. In a [later section](deployment-graph-drivers-http-adapters), you'll learn how to change this input using another Ray Serve-provided object called the `DAGDriver`.
(deployment-graph-call-graph-input-node-note)=
:::{note}
`InputNode` is merely a representation of the future graph input. In this example, for instance, `http_request`'s type is `InputNode`, not an actual HTTP request. When the graph is deployed, incoming HTTP requests are passed into the same functions and methods that `http_request` is passed into.
The `InputNode` tells Ray Serve where to send the graph input at runtime. In this example, for instance, `http_request` is an `InputNode` object, so you can't call `request` methods like `.json()` on it directly in the context manager. However, during runtime, Ray Serve passes incoming HTTP requests directly into the same functions and methods that `http_request` is passed into, so those functions and methods can call `request` methods like `.json()` on the `request` object that gets passed in.
:::
We use the `InputNode` to indicate which node(s) the graph input should be passed to by passing the `InputNode` into `bind` calls within the context manager. In this case, the `http_request` is passed to only one node, `unpack_request`. The output of that bind call, `request_number` is a `FunctionNode`. `FunctionNodes` are produced when deployments containing functions are bound to arguments for that function using `bind`. In this case `request_number` represents the output of `unpack_request` when called on incoming HTTP requests. `unpack_request`, which is defined on line 26, processes the HTTP request's JSON body and returns a number that can be passed into arithmetic operations.
You can use the `InputNode` to indicate which node(s) the graph input should be passed into by passing the `InputNode` into `bind` calls within the context manager. In this example, the `http_request` is passed to only one node, `unpack_request`. The output of that bind call, `request_number`, is a `FunctionNode`. `FunctionNodes` are produced when deployments containing functions are bound to arguments for that function using `bind`. `request_number` represents the output of `unpack_request` when called on incoming HTTP requests. `unpack_request`, which is defined on line 26, processes the HTTP request's JSON body and returns a number that can be passed into arithmetic operations.
:::{tip}
If you don't want to manually unpack HTTP requests, check out this guide's section on [HTTP adapters](deployment-graph-drivers-http-adapters), which can handle unpacking for you.
@ -180,7 +181,7 @@ If you don't want to manually unpack HTTP requests, check out this guide's secti
The graph then passes `request_number` into a `bind` call on `add_2`'s `add` method. The output of this call, `add_2_output` is a `MethodNode`. `MethodNodes` are produced when `ClassNode` methods are bound to arguments using `bind`. In this case, `add_2_output` represents the result of adding 2 to the number in the request.
The rest of the call graph uses another `FunctionNode` and `MethodNode` to finish the chain of arithmetic. `add_2_output` is bound to the `subtract_one_fn` deployment, producing the `subtract_1_output` `FunctionNode`. Then, the `subtract_1_output` is bound to the `add_3.add` method, producing the `add_3_output` `MethodNode`. This `add_3_output` `MethodNode` represents the final output from our chain of arithmetic operations.
The rest of the call graph uses another `FunctionNode` and `MethodNode` to finish the chain of arithmetic. `add_2_output` is bound to the `subtract_one_fn` deployment, producing the `subtract_1_output` `FunctionNode`. Then, the `subtract_1_output` is bound to the `add_3.add` method, producing the `add_3_output` `MethodNode`. This `add_3_output` `MethodNode` represents the final output from the chain of arithmetic operations.
To run the call graph, you need to use a driver. Drivers are deployments that process the call graph that you've written and route incoming requests through your deployments based on that graph. Ray Serve provides a driver called `DAGDriver` used on line 38:
@ -188,7 +189,7 @@ To run the call graph, you need to use a driver. Drivers are deployments that pr
deployment_graph = DAGDriver.bind(add_3_output)
```
Generally, the `DAGDriver` needs to be bound to the `FunctionNode` or `MethodNode` representing the final output of our graph. This `bind` call returns a `ClassNode` that you can run in `serve.run` or `serve run`. Running this `ClassNode` also deploys the rest of the graph's deployments.
Generally, the `DAGDriver` needs to be bound to the `FunctionNode` or `MethodNode` representing the final output of a graph. This `bind` call returns a `ClassNode` that you can run in `serve.run` or `serve run`. Running this `ClassNode` also deploys the rest of the graph's deployments.
:::{note}
The `DAGDriver` can also be bound to `ClassNodes`. This is useful if you construct a deployment graph where `ClassNodes` invoke other `ClassNodes`' methods. In this case, you should pass in the "root" `ClassNode` to `DAGDriver` (i.e. the one that you would otherwise pass into `serve.run`). Check out the [Calling Deployments using ServeHandles](serve-model-composition-serve-handles) section for more info.
@ -216,42 +217,6 @@ $ python arithmetic_client.py
9
```
(deployment-graph-call-graph-testing)=
### Testing the Call Graph with the Python API
All `MethodNodes` and `FunctionNodes` have an `execute` method. You can use this method to test your graph in Python, without using HTTP requests.
To test your graph,
1. Call `execute` on the `MethodNode` or `FunctionNode` that you would pass into the `DAGDriver`.
2. Pass in the input to the graph as the argument. **This argument becomes the input represented by `InputNode`**. Make sure to refactor your call graph accordingly, since it takes in this input directly, instead of an HTTP request.
3. `execute` returns a reference to the result, so the graph can execute asynchronously. Call `ray.get` on this reference to get the final result.
As an example, we can rewrite the [arithmetic call graph example](deployment-graph-arithmetic-graph) from above to use `execute`:
```python
with InputNode() as request_number:
add_2_output = add_2.add.bind(request_number)
subtract_1_output = subtract_one_fn.bind(add_2_output)
add_3_output = add_3.add.bind(subtract_1_output)
ref = add_3_output.execute(5)
result = ray.get(ref)
print(result)
```
Then we can run the script directly:
```
$ python arithmetic.py
9
```
:::{note}
The `execute` method deploys your deployment code inside Ray tasks and actors instead of Ray Serve deployments. It's useful for testing because you don't need to launch entire deployments and ping them with HTTP requests, but it's not suitable for production.
:::
(deployment-graph-drivers-http-adapters)=
### Drivers and HTTP Adapters
@ -259,23 +224,54 @@ Ray Serve provides the `DAGDriver`, which routes HTTP requests through your call
The `DAGDriver` also has an optional keyword argument: `http_adapter`. [HTTP adapters](serve-http-adapters) are functions that get run on the HTTP request before it's passed into the graph. Ray Serve provides a handful of these adapters, so you can rely on them to conveniently handle the HTTP parsing while focusing your attention on the graph itself.
For instance, we can use the Ray Serve-provided `json_request` adapter to simplify our [arithmetic call graph](deployment-graph-arithmetic-graph) by eliminating the `unpack_request` function. Here's the revised call graph and driver:
For instance, you can use the Ray Serve-provided `json_request` adapter to simplify the [arithmetic call graph](deployment-graph-arithmetic-graph) by eliminating the `unpack_request` function. You can replace lines 29 through 38 with this graph:
```python
from ray.serve.http_adapters import json_request
with InputNode() as request_number:
add_2_output = add_2.add.bind(request_number)
subtract_1_output = subtract_one_fn.bind(add_2_output)
add_3_output = add_3.add.bind(subtract_1_output)
graph = DAGDriver.bind(add_3_output, http_adapter=json_request)
(http-adapter-arithmetic-example)=
```{literalinclude} doc_code/model_composition/arithmetic.py
:start-after: __adapter_graph_start__
:end-before: __adapter_graph_end__
:language: python
```
Note that the `http_adapter`'s output type becomes what the `InputNode` represents. Without the `json_request` adapter, the `InputNode` represented an HTTP request. With the adapter, it now represents the number packaged inside the request's JSON body. You can work directly with that body's contents in the graph instead of first processing it.
Without an `http_adapter`, an `InputNode` [represents an HTTP request](deployment-graph-call-graph-input-node-note), and at runtime, incoming HTTP `request` objects are passed into the same functions and methods that the `InputNode` is passed into. When you set an `http_adapter`, the `InputNode` represents the `http_adapter`'s output.
At runtime:
1. Ray Serve sends each HTTP `request` object to the `DAGDriver`.
2. The `DAGDriver` calls the `http_adapter` function on each request.
3. The `DAGDriver` passes the `http_adapter` output to the same function and methods that the `InputNode` is passed into, kicking off the request's journey through the call graph.
In the example above, the `InputNode` represents the number packaged inside the request's JSON body instead of the HTTP request itself. You can pass the JSON directly into the graph instead of first unpacking it from the request.
See [the guide](serve-http-adapters) on `http_adapters` to learn more.
(deployment-graph-call-graph-testing)=
### Testing the Graph with the Python API
The `serve.run` function returns a handle that you can use to test your graph in Python, without using HTTP requests.
To test your graph,
1. Call `serve.run` on your graph and store the returned handle.
2. Call `handle.predict.remote(input)`. **The `input` argument becomes the input represented by `InputNode`**. Make sure to refactor your call graph accordingly, since it takes in this input directly, instead of an HTTP request. You can use an [HTTP adapter](deployment-graph-drivers-http-adapters) to make sure the graph you're testing matches the one you ultimately deploy.
3. `predict.remote` returns a reference to the result, so the graph can execute asynchronously. Call `ray.get` on this reference to get the final result.
As an example, you can continue rewriting the [arithmetic graph example](http-adapter-arithmetic-example) from above to use `predict.remote`. You can add testing code to the example:
```{literalinclude} doc_code/model_composition/arithmetic.py
:start-after: __test_graph_start__
:end-before: __test_graph_end__
:language: python
```
Note that the graph itself is still the same. The only change is the testing code added after it. You can run this Python script directly now to test the graph:
```
$ python arithmetic.py
9
```
### Visualizing the Graph
You can render an illustration of your deployment graph to see its nodes and their connection.

View file

@ -8,7 +8,7 @@ This [deployment graph pattern](serve-deployment-graph-patterns-overview) lets y
## Code
```{literalinclude} ../../doc_code/branching_input.py
```{literalinclude} ../../doc_code/deployment_graph_patterns/branching_input.py
:language: python
:start-after: __graph_start__
:end-before: __graph_end__

View file

@ -6,7 +6,7 @@ This [deployment graph pattern](serve-deployment-graph-patterns-overview) allows
## Code
```{literalinclude} ../../doc_code/conditional.py
```{literalinclude} ../../doc_code/deployment_graph_patterns/conditional.py
:language: python
:start-after: __graph_start__
:end-before: __graph_end__
@ -32,7 +32,7 @@ async def combine(value_refs, combine_type):
The graph creates two `Model` nodes, with `weights` of 0 and 1. It then takes the `user_input` and unpacks it into two parts: a number and an operation.
:::{note}
`dag.execute()` can take an arbitrary number of arguments. These arguments can be unpacked by indexing into the `InputNode`. For example,
`handle.predict.remote()` can take an arbitrary number of arguments. These arguments can be unpacked by indexing into the `InputNode`. For example,
```python
with InputNode() as user_input:

View file

@ -6,7 +6,7 @@ This [deployment graph pattern](serve-deployment-graph-patterns-overview) is a l
## Code
```{literalinclude} ../../doc_code/linear_pipeline.py
```{literalinclude} ../../doc_code/deployment_graph_patterns/linear_pipeline.py
:language: python
:start-after: __graph_start__
:end-before: __graph_end__