ray/doc/source/workflows/advanced.rst

Advanced Topics
===============

Inplace Execution
-----------------

When executing a workflow task inside another workflow task, it is usually executed in another Ray worker process. This is good for resource and performance isolation, but at the cost of lower efficiency due to non-locality, scheduling and data transfer.

For example, this recursive workflow calculates the exponent. We write it with workflow so that we can recover from any task. However, it is really inefficient to scheduling each task in a different worker.

.. code-block:: python
    :caption: Workflow without inplace execution:

    import ray
    from ray import workflow

    @ray.remote
    def exp_remote(k, n):
        if n == 0:
            return k
        return workflow.continuation(exp_remote.bind(2 * k, n - 1))

We could optimize it with inplace option:

.. code-block:: python
    :caption: Workflow with inplace execution:

    import ray
    from ray import workflow

    @ray.remote
    def exp_inplace(k, n):
        if n == 0:
            return k
        return workflow.continuation(
            exp_inplace.options(**workflow.options(allow_inplace=True)).bind(2 * k, n - 1))

    assert workflow.create(exp_inplace.bind(3, 7)).run() == 3 * 2 ** 7


With ``allow_inplace=True``, the task that called ``.bind()`` executes in the function. Ray options are ignored because they are used for remote execution. Also, you cannot retrieve the output of an inplace task using ``workflow.get_output()`` before it finishes execution.

Inplace is also useful when you need to pass something that is only valid in the current process/physical machine to another task. For example:

.. code-block:: python

    @ray.remote
    def foo():
        x = "<something that is only valid in the current process>"
        return workflow.continuation(bar.options(**workflow.options(allow_inplace=True)).bind(x))


Wait for Partial Results
------------------------

By default, a workflow task will only execute after the completion of all of its dependencies. This blocking behavior prevents certain types of workflows from being expressed (e.g., wait for two of the three tasks to finish).

Analogous to ``ray.wait()``, in Ray Workflow we have ``workflow.wait(*tasks: List[Workflow[T]], num_returns: int = 1, timeout: float = None) -> (List[T], List[Workflow[T])``. Calling `workflow.wait` would generate a logical task . The output of the logical task is a tuple of ready workflow results, and workflow results that have not yet been computed. For example, you can use it to print out workflow results as they are computed in the following dynamic workflow:

.. code-block:: python

    @ray.remote
    def do_task(i):
       time.sleep(random.random())
       return "task {}".format(i)

    @ray.remote
    def report_results(wait_result: Tuple[List[str], List[Workflow[str]]]):
        ready, remaining = wait_result
        for result in ready:
            print("Completed", result)
        if not remaining:
            return "All done"
        else:
            return workflow.continuation(report_results.bind(workflow.wait(remaining)))

    tasks = [do_task.bind(i) for i in range(100)]
    report_results.bind(workflow.wait(tasks)).run()


Workflow task Checkpointing
---------------------------

Ray Workflows provides strong fault tolerance and exactly-once execution semantics by checkpointing. However, checkpointing could be time consuming, especially when you have large inputs and outputs for workflow tasks. When exactly-once execution semantics is not required, you can skip some checkpoints to speed up your workflow.


We control the checkpoints by specify the checkpoint options like this:

.. code-block:: python

    data = read_data.options(**workflow.options(checkpoint=False)).bind(10)

This example skips checkpointing the output of ``read_data``. During recovery, ``read_data`` would be executed again if recovery requires its output.

By default, we have ``checkpoint=True`` if not specified.

If the output of a task is another task (i.e. dynamic workflows), we skips checkpointing the entire task.
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00			`Advanced Topics`
			`===============`

			`Inplace Execution`
			`-----------------`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`When executing a workflow task inside another workflow task, it is usually executed in another Ray worker process. This is good for resource and performance isolation, but at the cost of lower efficiency due to non-locality, scheduling and data transfer.`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`For example, this recursive workflow calculates the exponent. We write it with workflow so that we can recover from any task. However, it is really inefficient to scheduling each task in a different worker.`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
			`.. code-block:: python`
			`:caption: Workflow without inplace execution:`

[workflow] Update workflow doc and examples (#24804) * update doc of workflow options * update examples and make sure they are working 2022-05-16 15:41:14 -07:00			`import ray`
			`from ray import workflow`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`@ray.remote`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00			`def exp_remote(k, n):`
			`if n == 0:`
			`return k`
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`return workflow.continuation(exp_remote.bind(2 * k, n - 1))`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
			`We could optimize it with inplace option:`

			`.. code-block:: python`
			`:caption: Workflow with inplace execution:`

[workflow] Update workflow doc and examples (#24804) * update doc of workflow options * update examples and make sure they are working 2022-05-16 15:41:14 -07:00			`import ray`
			`from ray import workflow`

			`@ray.remote`
			`def exp_inplace(k, n):`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00			`if n == 0:`
			`return k`
[workflow] Update workflow doc and examples (#24804) * update doc of workflow options * update examples and make sure they are working 2022-05-16 15:41:14 -07:00			`return workflow.continuation(`
			`exp_inplace.options(*workflow.options(allow_inplace=True)).bind(2 k, n - 1))`

			`assert workflow.create(exp_inplace.bind(3, 7)).run() == 3 * 2 ** 7`

[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			With ``allow_inplace=True``, the task that called ``.bind()`` executes in the function. Ray options are ignored because they are used for remote execution. Also, you cannot retrieve the output of an inplace task using ``workflow.get_output()`` before it finishes execution.
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`Inplace is also useful when you need to pass something that is only valid in the current process/physical machine to another task. For example:`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00
			`.. code-block:: python`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`@ray.remote`
			`def foo():`
[Workflow] Workflow tail recursion optimization (#19928) * tail recursion optimization 2021-11-12 09:13:40 -08:00			`x = "<something that is only valid in the current process>"`
[workflow] Update workflow doc and examples (#24804) * update doc of workflow options * update examples and make sure they are working 2022-05-16 15:41:14 -07:00			`return workflow.continuation(bar.options(**workflow.options(allow_inplace=True)).bind(x))`
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00

			`Wait for Partial Results`
			`------------------------`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`By default, a workflow task will only execute after the completion of all of its dependencies. This blocking behavior prevents certain types of workflows from being expressed (e.g., wait for two of the three tasks to finish).`
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			Analogous to ``ray.wait()``, in Ray Workflow we have ``workflow.wait(*tasks: List[Workflow[T]], num_returns: int = 1, timeout: float = None) -> (List[T], List[Workflow[T])``. Calling `workflow.wait` would generate a logical task . The output of the logical task is a tuple of ready workflow results, and workflow results that have not yet been computed. For example, you can use it to print out workflow results as they are computed in the following dynamic workflow:
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00
			`.. code-block:: python`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`@ray.remote`
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00			`def do_task(i):`
			`time.sleep(random.random())`
			`return "task {}".format(i)`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`@ray.remote`
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00			`def report_results(wait_result: Tuple[List[str], List[Workflow[str]]]):`
			`ready, remaining = wait_result`
			`for result in ready:`
			`print("Completed", result)`
			`if not remaining:`
			`return "All done"`
			`else:`
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`return workflow.continuation(report_results.bind(workflow.wait(remaining)))`
[workflow] workflow.wait() feature (#20163) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows. 2021-11-30 12:30:28 -08:00
[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`tasks = [do_task.bind(i) for i in range(100)]`
			`report_results.bind(workflow.wait(tasks)).run()`
[workflow][doc] Doc for workflow checkpointing (#23510) 2022-03-27 12:18:14 -07:00

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`Workflow task Checkpointing`
[workflow][doc] Doc for workflow checkpointing (#23510) 2022-03-27 12:18:14 -07:00			`---------------------------`

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`Ray Workflows provides strong fault tolerance and exactly-once execution semantics by checkpointing. However, checkpointing could be time consuming, especially when you have large inputs and outputs for workflow tasks. When exactly-once execution semantics is not required, you can skip some checkpoints to speed up your workflow.`
[workflow][doc] Doc for workflow checkpointing (#23510) 2022-03-27 12:18:14 -07:00

			`We control the checkpoints by specify the checkpoint options like this:`

			`.. code-block:: python`

[workflow] Update workflow doc and examples (#24804) * update doc of workflow options * update examples and make sure they are working 2022-05-16 15:41:14 -07:00			`data = read_data.options(**workflow.options(checkpoint=False)).bind(10)`
[workflow][doc] Doc for workflow checkpointing (#23510) 2022-03-27 12:18:14 -07:00
			This example skips checkpointing the output of ``read_data``. During recovery, ``read_data`` would be executed again if recovery requires its output.

			By default, we have ``checkpoint=True`` if not specified.

[workflow] Update workflow docs (#24249) * update workflow docs * rename "step" to "task" 2022-05-05 22:22:51 -07:00			`If the output of a task is another task (i.e. dynamic workflows), we skips checkpointing the entire task.`
[workflow][doc] Doc for workflow checkpointing (#23510) 2022-03-27 12:18:14 -07:00