Each workflow has a unique ``workflow_id``. By default, when you call ``.run()`` or ``.run_async()``, a random id is generated. It is recommended you explicitly assign each workflow an id via ``.run(workflow_id="id")``.
Ray workflows currently has no built-in job scheduler. You can however easily use any external job scheduler to interact with your Ray cluster (via :ref:`job submission <jobs-overview>` or :ref:`client connection <ray-client>`) trigger workflow runs.
Workflows supports two types of storage backends out of the box:
* Local file system: the data is stored locally. This is only for single node testing. It needs to be a NFS to work with multi-node clusters. To use local storage, specify ``workflow.init(storage="/path/to/storage_dir")``.
* S3: Production users should use S3 as the storage backend. Enable S3 storage with ``workflow.init(storage="s3://bucket/path")``.
Additional storage backends can be written by subclassing the ``Storage`` class and passing a storage instance to ``workflow.init()`` [TODO: note that the Storage API is not currently stable].
Besides ``workflow.init()``, the storage URI can also be set via environment variable:
..code-block:: python
import os
from ray import workflow
# Option 1: pass the url through ``workflow.init``
workflow.init("/local/path")
# Option 2: set os environment variable RAY_WORKFLOW_STORAGE
If left unspecified, ``/tmp/ray/workflow_data`` will be used for temporary storage. This default setting *will only work for single-node Ray clusters*.
Ray logs the runtime environment (code and dependencies) of the workflow to storage at submission time. This ensures that the workflow can be resumed at a future time on a different Ray cluster.
You can also explicitly set the runtime environment for a particular step (e.g., specify conda environment, container image, etc.).
For virtual actors, the runtime environment of the actor can be upgraded via the virtual actor management API.