I am surprised by the fact that `GetTimeoutError` is not a subclass of `TimeoutError`, which is counter-intuitive and may discourage users from trying the timeout feature in `ray.get`, because you have to "guess" the correct error type. For most people, I believe the first error type in their mind would be `TimeoutError`.
This PR fixes this.
We've supported namespace in c++ worker in https://github.com/ray-project/ray/pull/26327. Here we add doc for usage and also reinforce the documents of Java and Python, like adding explanation of specifying namespace while creating named actors.
- [x] add doc for basic c++ worker namespace usage
- [x] add explanation for specifying namespace while creating named actors, in Python, Java and C++
This PR adds supported for specifying an exception allowlist (List[Exception]) as the retry_exceptions argument, such that an application-level exception will only be retried if it is in the allowlist.
Content of the two docs were switched.
Unnecessary Ray Get images were correctly in `unnecessary-ray-get.rst`, which made this noticeable beyond the URL.
Users' intuition might lead them to fill out `excludes` with absolute paths, e.g. `/Users/working_dir/subdir/`. However, the `excludes` field uses `gitignore` syntax. In `gitignore` syntax, paths that start with `/` are interpreted relative to the level of the directory where the `gitignore` file resides, and in our case this is the `working_dir` directory (morally speaking, since there's no actual `.gitignore` file.) So the correct thing to put in `excludes` would be `/subdir/`. As long as we support `gitignore` syntax, we should have a note in the docs for this. This PR adds the note.
Duplicate for #25247.
Adds a fix for Dask-on-Ray. Previously, for tasks with multiple return values, we implicitly allowed returning a dict with the return index as the key. This was used by Dask-on-Ray, but this is not documented behavior, and we now require task returns to be iterable instead.
This PR allows the user to override the global default for max_retries for non-actor tasks. It adds an OS env called RAY_task_max_retries which can be passed to the driver or set with runtime envs. Any future tasks submitted by that worker will default to this value instead of 3, the hard-coded default.
It would be nicer if we could have a standard way of setting these defaults, but I think this is fine as a one-off for now (not a clear need for overriding defaults of other @ray.remote options yet).
Related issue number
Closes#24854.
Adds support for Python generators instead of just normal return functions when a task has multiple return values. This will allow developers to cut down on total memory usage for tasks, as they can free previous return values before allocating the next one on the heap.
The semantics for num_returns are about the same as usual tasks - the function will throw an error if the number of values returned by the generator does not match the number of return values specified by the user. The one difference is that if num_returns=1, the task will throw the usual Python exception that the generator cannot be pickled.
As an example, this feature will allow us to reduce memory usage in Datasets shuffle operations (see #25200 for a prototype).
* [runtime env] runtime env inheritance refactor (#22244)
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.
We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.
Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**
Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
This PR moves all exception classes from runtime module to api module. It's aiming to eliminate the confusion about ray exceptions. It means that Ray users don't need to touch runtime module when API programming after this PR.
Note that this should be merged onto 2.0.
This example simply doesn't run as is. We can bring it back up again later, if it makes sense. But it's not clear what the variables used there, like actor are. Fixes#21328
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
Why are these changes needed?
Current documentation code in Message passing using Ray Queue can be enhanced, for better demonstration of the message queue.
It creates 10 tasks but only 2 consumers, and each consumer consumes one task then exit. Therefore, the output is a bit vague:
(consumer pid=1022727) got work 0
(consumer pid=1022595) got work 1
So I make consumer working until the queue is empty. The output shows consumer 1 and 2 working in parallel:
(consumer pid=1030876) consumer 0 got work 0
(consumer pid=1030876) consumer 0 got work 1
(consumer pid=1030876) consumer 0 got work 3
(consumer pid=1030876) consumer 0 got work 5
(consumer pid=1030876) consumer 0 got work 7
(consumer pid=1030876) consumer 0 got work 9
(consumer pid=1030949) consumer 1 got work 2
(consumer pid=1030949) consumer 1 got work 4
(consumer pid=1030949) consumer 1 got work 6
(consumer pid=1030949) consumer 1 got work 8
P.S. Also fix a typo in doc.