[runtime env] [Doc] Runtime env doc and messaging improvements (#17547)

This commit is contained in:
architkulkarni 2021-08-04 12:28:42 -07:00 committed by GitHub
parent e3c09b0af1
commit 63708468df
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 34 additions and 25 deletions

View file

@ -464,6 +464,7 @@ The ``runtime_env`` is a Python dictionary including one or more of the followin
- ``working_dir`` (Path): Specifies the working directory for your job. This must be an existing local directory.
It will be cached on the cluster, so the next time you connect with Ray Client you will be able to skip uploading the directory contents.
Furthermore, if you locally make a small change to your directory, the next time you connect only the updated part will be uploaded.
All Ray workers for your job will be started in their node's copy of this working directory.
- Examples
@ -473,7 +474,14 @@ The ``runtime_env`` is a Python dictionary including one or more of the followin
Note: Setting this option per-task or per-actor is currently unsupported.
- ``pip`` (List[str] | str): Either a list of pip packages, or a string containing the path to a pip
Note: If your working directory contains a `.gitignore` file, the files and paths specified therein will not be uploaded to the cluster.
- ``excludes`` (List[str]): When used with ``working_dir``, specifies a list of files or paths to exclude from being uploaded to the cluster.
This field also supports the pattern-matching syntax used by ``.gitignore`` files: see `<https://git-scm.com/docs/gitignore>`_ for details.
- Example: ``["my_file.txt", "path/to/dir", "*.log"]``
- ``pip`` (List[str] | str): Either a list of pip packages, or a string containing the path to a pip
`“requirements.txt” <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`_ file. The path may be an absolute path or a relative path. (Note: A relative path will be interpreted relative to ``working_dir`` if ``working_dir`` is specified.)
This will be dynamically installed in the ``runtime_env``.
To use a library like Ray Serve or Ray Tune, you will need to include ``"ray[serve]"`` or ``"ray[tune]"`` here.
@ -482,10 +490,10 @@ The ``runtime_env`` is a Python dictionary including one or more of the followin
- Example: ``"./requirements.txt"``
- ``conda`` (dict | str): Either (1) a dict representing the conda environment YAML, (2) a string containing the path to a
- ``conda`` (dict | str): Either (1) a dict representing the conda environment YAML, (2) a string containing the path to a
`conda “environment.yml” <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually>`_ file,
or (3) the name of a local conda env already installed on each node in your cluster (e.g., ``"pytorch_p36"``).
In the first two cases, the Ray and Python dependencies will be automatically injected into the environment to ensure compatibility, so there is no need to manually include them.
In the first two cases, the Ray and Python dependencies will be automatically injected into the environment to ensure compatibility, so there is no need to manually include them.
Note that the ``conda`` and ``pip`` keys of ``runtime_env`` cannot both be specified at the same time---to use them together, please use ``conda`` and add your pip dependencies in the ``"pip"`` field in your conda ``environment.yaml``.
- Example: ``{"conda": {"dependencies": ["pytorch", “torchvision”, "pip", {"pip": ["pendulum"]}]}}``
@ -502,9 +510,9 @@ The ``runtime_env`` is a Python dictionary including one or more of the followin
The runtime env is inheritable, so it will apply to all tasks/actors within a job and all child tasks/actors of a task or actor, once set.
If a child actor or task specifies a new ``runtime_env``, it will be merged with the parents ``runtime_env`` via a simple dict update.
If a child actor or task specifies a new ``runtime_env``, it will be merged with the parents ``runtime_env`` via a simple dict update.
For example, if ``runtime_env["pip"]`` is specified, it will override the ``runtime_env["pip"]`` field of the parent.
The one exception is the field ``runtime_env["env_vars"]``. This field will be `merged` with the ``runtime_env["env_vars"]`` dict of the parent.
The one exception is the field ``runtime_env["env_vars"]``. This field will be `merged` with the ``runtime_env["env_vars"]`` dict of the parent.
This allows for an environment variables set in the parent's runtime environment to be automatically propagated to the child, even if new environment variables are set in the child's runtime environment.
Here are some examples of runtime envs combining multiple options:
@ -513,7 +521,7 @@ Here are some examples of runtime envs combining multiple options:
TODO(architkulkarni): run working_dir doc example in CI
.. code-block:: python
runtime_env = {"working_dir": "/code/my_project", "pip": ["pendulum=2.1.2"]}
.. literalinclude:: ../examples/doc_code/runtime_env_example.py

View file

@ -443,7 +443,7 @@ class DataServicerProxy(ray_client_pb2_grpc.RayletDataStreamerServicer):
f"using JobConfig: {job_config}!")
raise RuntimeError(
"Starting up Server Failed! Check "
"`ray_client_server.err` on the cluster.")
"`ray_client_server_[port].err` on the cluster.")
channel = self.proxy_manager.get_channel(client_id)
if channel is None:
logger.error(f"Channel not found for {client_id}")

View file

@ -231,9 +231,8 @@ def current_ray_pip_specifier() -> Optional[str]:
built from source locally (likely if you are developing Ray).
Examples:
Returns "ray[all]==1.4.0" if running the stable release
Returns "https://s3-us-west-2.amazonaws.com/ray-wheels/master/[..].whl"
if running the nightly or a specific commit
Returns "https://s3-us-west-2.amazonaws.com/ray-wheels/[..].whl"
if running a stable release, a nightly or a specific commit
"""
logger = get_hook_logger()
if os.environ.get("RAY_CI_POST_WHEEL_TESTS"):
@ -245,12 +244,12 @@ def current_ray_pip_specifier() -> Optional[str]:
Path(__file__).resolve().parents[3], ".whl", get_wheel_filename())
elif ray.__commit__ == "{{RAY_COMMIT_SHA}}":
# Running on a version built from source locally.
logger.warning(
"Current Ray version could not be detected, most likely "
"because you are using a version of Ray "
"built from source. If you wish to use runtime_env, "
"you can try building a wheel and including the wheel "
"explicitly as a pip dependency.")
if os.environ.get("RAY_RUNTIME_ENV_LOCAL_DEV_MODE") != "1":
logger.warning(
"Current Ray version could not be detected, most likely "
"because you have manually built Ray from source. To use "
"runtime_env in this case, set the environment variable "
"RAY_RUNTIME_ENV_LOCAL_DEV_MODE=1.")
return None
elif "dev" in ray.__version__:
# Running on a nightly wheel.

View file

@ -757,16 +757,18 @@ void NodeManager::WarnResourceDeadlock() {
std::ostringstream error_message;
error_message
<< "The actor or task with ID " << exemplar.GetTaskSpecification().TaskId()
<< " cannot be scheduled right now. It requires "
<< " cannot be scheduled right now. You can ignore this message if this "
<< "Ray cluster is expected to auto-scale or if you specified a "
<< "runtime_env for this actor or task, which may take time to install. "
<< "Otherwise, this is likely due to all cluster resources being claimed "
<< "by actors. To resolve the issue, consider creating fewer actors or "
<< "increasing the resources available to this Ray cluster.\n"
<< "Required resources for this actor or task: "
<< exemplar.GetTaskSpecification().GetRequiredPlacementResources().ToString()
<< " for placement, but this node only has remaining " << available_resources
<< ". In total there are " << pending_tasks << " pending tasks and "
<< pending_actor_creations << " pending actors on this node. "
<< "This is likely due to all cluster resources being claimed by actors. "
<< "To resolve the issue, consider creating fewer actors or increase the "
<< "resources available to this Ray cluster. You can ignore this message "
<< "if this Ray cluster is expected to auto-scale or if you specified a "
<< "runtime_env for this task or actor because it takes time to install.";
<< "\n"
<< "Available resources on this node: " << available_resources
<< "In total there are " << pending_tasks << " pending tasks and "
<< pending_actor_creations << " pending actors on this node.";
std::string error_message_str = error_message.str();
RAY_LOG(WARNING) << error_message_str;