# The workflow can also be executed asynchronously.
# print(ray.get(output.run_async()))
Here we created the workflow:
..image:: basic.png
:width:500px
:align:center
Workflow steps behave similarly to Ray tasks. They are declared via the ``@workflow.step`` annotation, and can take in either concrete values or the output of another workflow step as an argument.
Unlike Ray tasks, which return an ``ObjectRef[T]`` when called and are executed eagerly, workflow steps return ``Workflow[T]`` and are not executed until ``.run()`` is called on a composed workflow DAG.
Composing workflow steps
------------------------
As seen in the example above, workflow steps can be composed by passing ``Workflow[T]`` outputs to other steps. Just like a Ray task, a step can take multiple inputs:
Here we can see though ``get_val1.step()`` returns a ``Workflow[int]``, when passed to the ``add`` step, the ``add`` function will see its resolved value.
Workflow results can be retrieved with ``workflow.get_output(workflow_id) -> ObjectRef[T]``. If the workflow has not yet completed, calling ``ray.get()`` on the returned reference will block until the result is computed. For example:
Once a step is given a name, the result of the step will be retrievable via ``workflow.get_output(workflow_id, name="step_name")``. If the step with the given name hasn't been executed yet, an exception will be thrown. Here are some examples:
- If ``max_retries`` is given, the step will be retried for the given number of times if an exception is raised. It will only retry for the application level error. For system errors, it's controlled by ray. By default, ``max_retries`` is set to be 3.
- If ``catch_exceptions`` is True, the return value of the function will be converted to ``Tuple[Optional[T], Optional[Exception]]``. This can be combined with ``max_retries`` to try a given number of times before returning the result tuple.
Workflow steps provide *exactly-once* execution semantics. What this means is that once the result of a workflow step is logged to durable storage, Ray guarantees the step will never be re-executed. A step that receives the output of another workflow step can be assured that its inputs steps will never be re-executed.
Failure model
~~~~~~~~~~~~~
- If the cluster fails, any workflows running on the cluster enter RESUMABLE state. The workflows can be resumed on another cluster (see the management API section).
- The lifetime of the workflow is not coupled with the driver. If the driver exits, the workflow will continue running in the background of the cluster.
Note that steps that have side-effects still need to be idempotent. This is because the step could always fail prior to its result being logged.
Additional steps can be dynamically created and inserted into the workflow DAG during execution. The following example shows how to implement the recursive ``factorial`` program using dynamically generated steps:
..code-block:: python
@workflow.step
def factorial(n: int) -> int:
if n == 1:
return 1
else:
return multiply.step(n, factorial.step(n - 1))
@workflow.step
def multiply(a: int, b: int) -> int:
return a * b
ret = factorial.step(10).run()
assert ret.run() == 3628800
The key behavior to note is that when a step returns a ``Workflow`` output instead of a concrete value, that workflow's output will be substituted for the step's return. To better understand dynamic workflows, let's look at a more realistic example of booking a trip:
fut = book_trip.step("OAK", "SAN", ["6/12", "7/5"])
fut.run() # returns Receipt(...)
Here the workflow initially just consists of the ``book_trip`` step. Once executed, ``book_trip`` generates steps to book flights and hotels in parallel, which feeds into a step to decide whether to cancel the trip or finalize it. The DAG can be visualized as follows (note the dynamically generated nested workflows within ``book_trip``):
..image:: trip.png
:width:500px
:align:center
The execution order here will be:
1. Run the ``book_trip`` step.
2. Run the two ``book_flight`` steps and the ``book_hotel`` step in parallel.
3. Once all three booking steps finish, ``finalize_or_cancel`` will be executed and its return will be the output of the workflow.
Ray Integration
---------------
Mixing steps with Ray tasks and actors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Workflows are compatible with Ray tasks and actors. There are two methods of using them together:
1. Workflows can be launched from within a Ray task or actor. For example, you can launch a long-running workflow from Ray serve in response to a user request. This is no different from launching a workflow from the driver program.
2. Workflow steps can use Ray tasks or actors within a single step. For example, a step could use Ray Train internally to train a model. No durability guarantees apply to the tasks or actors used within the step; if the step fails, it will be re-executed from scratch.
Unlike Ray tasks, when you pass a list of ``Workflow`` outputs to a step, the values are fully resolved. This ensures that all a step's ancestors are fully executed prior to the step starting:
ret = add.step([get_val.step() for _ in range(3)])
assert ret.run() == 30
Passing object references between steps
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ray object references and data structures composed of them (e.g., ``ray.Dataset``) can be passed into and returned from workflow steps. To ensure recoverability, their contents will be logged to durable storage. However, an object will not be checkpointed more than once, even if it is passed to many different steps.