The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.
The Datasets UX assessment showed that users had difficulties in writing UDFs: what's input/output types, how to write the function etc.
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
Duplicate for #25247.
Adds a fix for Dask-on-Ray. Previously, for tasks with multiple return values, we implicitly allowed returning a dict with the return index as the key. This was used by Dask-on-Ray, but this is not documented behavior, and we now require task returns to be iterable instead.
This PR allows the user to override the global default for max_retries for non-actor tasks. It adds an OS env called RAY_task_max_retries which can be passed to the driver or set with runtime envs. Any future tasks submitted by that worker will default to this value instead of 3, the hard-coded default.
It would be nicer if we could have a standard way of setting these defaults, but I think this is fine as a one-off for now (not a clear need for overriding defaults of other @ray.remote options yet).
Related issue number
Closes#24854.
Adds support for Python generators instead of just normal return functions when a task has multiple return values. This will allow developers to cut down on total memory usage for tasks, as they can free previous return values before allocating the next one on the heap.
The semantics for num_returns are about the same as usual tasks - the function will throw an error if the number of values returned by the generator does not match the number of return values specified by the user. The one difference is that if num_returns=1, the task will throw the usual Python exception that the generator cannot be pickled.
As an example, this feature will allow us to reduce memory usage in Datasets shuffle operations (see #25200 for a prototype).
Pointing to the latest documentation for contributor is important as the workflow is always evolving. E.g. the installation instructions for bazel are not representatives of the current state on release vs master. Hence, I propose to update contribution links in the documentation to point to the latest state on master.
As described in the related issue, using model_weight as the key throws an error.
This update points the user to use model as the key instead.
Co-authored-by: tamilflix <tamilflix30@gmail.com>