## Why are these changes needed?
Add metadata to workflow. Currently there is no option for user to attach any metadata to a step or workflow run, and workflow running metrics (except status) are not captured nor checkpointed.
We are adding various of metadata including:
1. step-level user metadata. can be set with `step.options(metadata={})`
2. step-level pre-run metadata. this captures pre-run metadata such as step_start_time, more metrics can be added later.
3. step-level post-run metadata. this captures post-run metadata such as step_end_time, more metrics can be added later.
4. workflow-level user metadata. can be set with `workflow.run(metadata={})`
5. workflow-level pre-run metadata. this captures pre-run metadata such as workflow_start_time, more metrics can be added later.
6. workflow-level post-run metadata. this captures post-run metadata such as workflow_end_time, more metrics can be added later.
## Related issue number
https://github.com/ray-project/ray/issues/17090
Co-authored-by: Yi Cheng <chengyidna@gmail.com>
## Why are these changes needed?
Before this PR, there is a race condition where:
- job register starts
- driver start to launch actor
- gcs register actor ===> crash
- job register ends
Actor registration should be forced to be after driver registration. This PR enforces that.
## Related issue number
Closes#19172
* [core] nicer error message for unpickleable exceptions
I ran into a case where we had an exception that wasn't unpickleable:
```
pickle.loads(pickle.dumps(filelock.Timeout()))
```
When a filelock.Timeout is raised on a worker, it gets surfaced in a way
that makes ray look like it was responsible:
```
ray.exceptions.RaySystemError: System error: __init__() missing 1 required positional argument: 'lock_file'
```
This PR turns the following stacktrace:
```
return ray.get(refs, timeout=timeout)
File "/opt/conda/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/ray/worker.py", line 1623, in get
raise value
ray.exceptions.RaySystemError: System error: __init__() missing 1 required positional argument: 'lock_file'
traceback: Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/ray/serialization.py", line 254, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File "/opt/conda/lib/python3.7/site-packages/ray/serialization.py", line 213, in _deserialize_object
return RayError.from_bytes(obj)
File "/opt/conda/lib/python3.7/site-packages/ray/exceptions.py", line 28, in from_bytes
return pickle.loads(ray_exception.serialized_exception)
TypeError: __init__() missing 1 required positional argument: 'lock_file'
```
into this:
```
...
return ray.get(refs, timeout=timeout)
File "/opt/conda/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/ray/worker.py", line 1623, in get
raise value
ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception
traceback: Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/ray/exceptions.py", line 29, in from_bytes
return pickle.loads(ray_exception.serialized_exception)
TypeError: __init__() missing 1 required positional argument: 'lock_file'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/ray/serialization.py", line 254, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File "/opt/conda/lib/python3.7/site-packages/ray/serialization.py", line 213, in _deserialize_object
return RayError.from_bytes(obj)
File "/opt/conda/lib/python3.7/site-packages/ray/exceptions.py", line 31, in from_bytes
raise RuntimeError("Failed to unpickle serialized exception") from e
RuntimeError: Failed to unpickle serialized exception
```
* lint
* test_unpickleable_stacktrace
* lint
* .
* .
Co-authored-by: hauntsaninja <>
* Make binbacking prioritize nodes better
Make binpacking prefer nodes that match multiple
resource types.
* spelling
* order demands when binpacking, starting from complex ones
* add stability to resource demand ordering
* lint
* logging
* add comments
* +comment
* use set