This PR enables stage fusion for dataset pipelines. This also requires:
1. Removing the num_cpus=0.5 default for the read stage, to enable fusion of the read stage.
2. Removing spread_resource_prefix (not supported for now).
We should just encourage people to use the existing `get_runtime_context` API instead of introducing a new one here. Just removing the docs for now while we discuss this.
Why are these changes needed?
Data from PutRequests is chunked into 64MiB messages over the datastream, to avoid the 2GiB message size limit from gRPC. This will allow users to transfer objects larger than 2GiB over the network.
Proto changes
Put requests now have fields for chunk_id to identify which chunk data belongs to, total_chunks to identify the total number of chunks in the object, and total_size for total size in bytes of the object (useful for raising warnings).
PutObject is still unary-unary. The dataservicer handles reassembling the chunks before passing the result to the underlying RayletServicer.
Dataclient changes
If a put request is inserted into the request queue, self._requests will chunk it lazily. Doing this lazily is important since inserting all of the chunks onto the request queue immediately would double the amount of memory needed to handle a large request. This also guarantees that the chunks of a given putrequest will be contiguous
Dataservicer changes
The dataservicer now maintains some state to track received chunks. Once all chunks for a putrequest are received, the combined chunks are passed to the raylet servicer.
Ray DAG Changes
- Restructured and resolves circular imports in current dag_node.py.
- Moved `__str__` to each DAGNode subclass level with centralized utils imports
- Removed restrictions on binding `InputNode` to `FunctionNode` and `ClassMethodNode`
- Moved `_contain_input_node` to only `ClassNode` and `DeploymentNode`
Serve DAG Changes
- Added DeploymentNode
- Cannot be directly constructed
- Holds deployment func or class body as well as handle that trivially maps to `__call__` method (match current behavior)
- Upon accessing an attribute, it will spawn DeploymentMethodNode node with `other_args_to_resolve` passed in to differentiate sync handle type and others
- Added DeploymentMethodNode
- Holds arg and deployment handle
- Executing on it translate to deployment handle call on the method.
Opencenus symobls haven been exported in linux version of libcore_worker_library_java.so, but deleted from ray_exported_symbols.lds, which makes streaming macos test case failed.
This pr add this exporting record and rename *ray*streaming* stuff to *ray*internal* that's a united entry to ray cpp.
Co-authored-by: 林濯 <lingxuan.zlx@antgroup.com>
Improve observability for general objects and lineage reconstruction by adding a "Status" field to `ray memory`. The value of the field can be:
```
// The task is waiting for its dependencies to be created.
WAITING_FOR_DEPENDENCIES = 1;
// All dependencies have been created and the task is scheduled to execute.
SCHEDULED = 2;
// The task finished successfully.
FINISHED = 3;
```
In addition, tasks that failed or that needed to be re-executed due to lineage reconstruction will have a field listing the attempt number. Example output:
```
IP Address | PID | Type | Call Site | Status | Size | Reference Type | Object Ref
192.168.4.22 | 279475 | Driver | (task call) ... | Attempt #2: FINISHED | 10000254.0 B | LOCAL_REFERENCE | c2668a65bda616c1ffffffffffffffffffffffff0100000001000000
```
This is the second PR to implement usage stats on Ray. Please refer to the file usage_lib.py for more details.
The full specification is here https://docs.google.com/document/d/1ZT-l9YbGHh-iWRUC91jS-ssQ5Qe2UQ43Lsoc1edCalc/edit#heading=h.17dss3b9evbj.
This adds a dashboard module to enable usage stats. **Usage stats report is turned off by default** after this PR. We can control the report (enablement, report period, and URL. Note that URL is strictly for testing) using the env variable.
## NOTE
This requires us to add `requests` to the default library. `requests` must be okay to be included because
1. it is extremely lightweight. It is implemented only with built-in libs.
2. It is really stable. The project basically claims they are "deprecated", meaning no new features will be added there.
cc @edoakes @richardliaw for the approval
For the HTTP request, I was alternatively considered httpx, but it was not as lightweight as `requests`. So I decided to implement async requests using the thread pool.
Mostly cluster tests are enabled in this PR in the above mentioned files. Some non-cluster tests are also enabled. All of these pass on my machine without issues.
Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc):
- The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment.
- Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent.
We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run.
Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior:
- **If there is no runtime env option when we create actor, inherit the parent runtime env.**
- **Otherwise, use the optional runtime env directly and don't do the merging.**
Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like:
```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))```
This new API also can be used in ray client.
- Separate spread scheduling and default hydra scheduling (i.e. SpreadScheduling != HybridScheduling(threshold=0)): they are already separated in the API layer and they have the different end goals so it makes sense to separate their implementations and evolve them independently.
- Simple round robin for spread scheduling: this is just a starting implementation, can be optimized later.
- Prefer not to spill back tasks that are waiting for args since the pull is already in progress.
While investigating #22161, it is observed GIL is held for an extended amount of time (up to 1000s) with stack trace [1]. It is possible either there are many iterations within `Pickle5Writer.write_to()` calling `ray::parallel_memcopy()`, or a few `ray::parallel_memcopy()` taking a long time (less likely). Either way, `ray::parallel_memcopy()` or `std::memcpy()` should not hold GIL.
This flag has been turned on by default for almost 4 months. Delete the old code so that when refactoring, we don't need to take care of the legacy code path.
The existing Job info in the cluster snapshot uses the old definition of Job, which is a single Ray driver (a single `ray.init()` connection).
In the new Job Submission protocol, a Job just specifies an entrypoint which can be any shell command. As such a Job can have zero or multiple Ray drivers. This means we should add a new snapshot entry corresponding to new jobs. We'll leave the old snapshot in place for legacy jobs.
- Also fixes `get_all_jobs` by using the appropriate KV namespace, and stripping the job key KV prefix from the job ID. It wasn't working before.
- This PR also unifies the datatype used by the GET jobs/ endpoint to be the same as the one used by the new jobs cluster snapshot. For backwards compatibility, the `status` and `message` fields are preserved.