[workflow][doc] Doc for workflow checkpointing (#23510)

This commit is contained in:
Siyuan (Ryans) Zhuang 2022-03-27 12:18:14 -07:00 committed by GitHub
parent 65d72dbd91
commit 6b1b25168f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -66,3 +66,23 @@ Analogous to ``ray.wait()``, in Ray Workflow we have ``workflow.wait(*steps: Lis
tasks = [do_task.step(i) for i in range(100)]
report_results.step(workflow.wait(tasks)).run()
Workflow Step Checkpointing
---------------------------
Ray Workflows provides strong fault tolerance and exactly-once execution semantics by checkpointing. However, checkpointing could be time consuming, especially when you have large inputs and outputs for workflow steps. When exactly-once execution semantics is not required, you can skip some checkpoints to speed up your workflow.
We control the checkpoints by specify the checkpoint options like this:
.. code-block:: python
data = read_data.options(checkpoint=False).step(10)
This example skips checkpointing the output of ``read_data``. During recovery, ``read_data`` would be executed again if recovery requires its output.
By default, we have ``checkpoint=True`` if not specified.
If the output of a step is another step (i.e. dynamic workflows), we skips checkpointing the entire step.