Updates jobs api
Updates snapshot api
Updates state api
Increases jobs api version to 2
Signed-off-by: Alan Guo aguo@anyscale.com
Why are these changes needed?
follow-up for #25902 (comment)
# Why are these changes needed?
This PR does 3 things
Add warnings for data truncation (which is a follow-up)
Improve some of confusing warning messages
order columns as it is defined in StateSchema (so that we can customize the column order for better usability). I did this only for list because i thought it wasn't that important for summary, but I might be wrong
Signed-off-by: rickyyx rickyx@anyscale.com
# Why are these changes needed?
When we returned less/incomplete results to users, there could be 3 reasons:
Data being truncated at the data source (raylets -> API server)
Data being filtered at the API server
Data being limited at the API server
We are not distinguishing the those 3 scenarios, but we should. This is why we thought data being truncated when it's actually filtered/limited.
This PR distinguishes these scenarios and prompt warnings accordingly.
# Related issue number
Closes#26570Closes#26923
This PR does 3 things.
1. Warn if callsite is disabled when `ray list objects` and `ray summary objects`
2. Decode owner_id for ray list actors
3. Support raise_on_missing_output
NOTE: tabulate is copied/pasted to the codebase for table formatting.
This PR changes the default layout to be the table format for both summary and list APIs.
This PR is doing 2 things.
(1) Use api_server_url to address which is consistent to other submission APIs.
(2) When the API is not responded timely, it prints a warning every 5 seconds. Below is an example. This is useful when the API is slowly responded (e.g., when there are partial failures). Without this users will see hanging API for 30 seconds, which is a pretty bad UX.
(0.12 / 10 seconds) Waiting for the response from the API server address http://127.0.0.1:8265/api/v0/delay/5.
## Why are these changes needed?
This PR adds data truncation when there are more than N number of entries. The policy is as follow;
By default, we return 100 entries at max. Users can adjust this value, but we won't allow to increase more than 10K.
By default, all internal RPCs truncate data if it's > 10K.
For distributed sources, we query each source with 10K limit and we apply limit again at the end.
## Related issue number
Closes https://github.com/ray-project/ray/issues/25984#issue-1279280673
Part of https://github.com/ray-project/ray/issues/25718#issue-1268968400
## Why are these changes needed?
This PR implements `!=` predicate for filtering. As a result of this PR, two APIs are changed.
```
--filter key value -> --filter "key=val" or ---filter "key!=val"
list_actors(filters=[(key, val), (key2, val2)]) -> list_actors(filters=[(key, "=", val), (key2, "=", val2)])
```
Task/actor/object summary
Tasks: Group by the func name. In the future, we will also allow to group by task_group.
Actors: Group by actor class name. In the future, we will also allow to group by actor_group.
Object: Group by callsite. In the future, we will allow to group by reference type or task state.
I’d like to propose a bit changes to the API. Currently we are returning the dict of ID -> value mapping when the list API is returned. But I am thinking to change this to a list because the sort will become ineffective if we return the dictionary. So, it’s ideal we use the list to keep the order (it’s important for deterministic order)
Also, for some APIs, each entry doesn’t have a unique id. For example, list objects will have duplicated object IDs from their entries, which is not working with dict return type (e.g., there can be more than 1 Object ID entry if the object is locally referenced & borrowed by task/pinned in memory)
Also, users can easily build dict index on their own if it is necessary.
This PR implements the basic log APIs. For the better APIs (like higher level APIs like ray logs actors), it will be implemented after the internal API review is done.
# If there's only 1 match, print a file content. Otherwise, print all files that match glob.
ray logs [glob_filter] --node-id=[head node by default]
Args:
--tail: Tail the last X lines
--follow: Follow the new logs
--actor-id: The actor id
--pid --node-ip: For worker logs
--node-id: The node id of the log
--interval: When --follow is specified, logs are printed with this interval. (should we remove it?)
This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs).
We will also support limited types of columns for each API.
The API is as follows
ray list [resources] -- filter [key] [value] => filter data that's key==value.
In the future, we can also support more complicated filtering like !=, And, Or , or etc.
It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later
This PR adds 2 more states into TaskStatus
enum TaskStatus {
// The task is scheduled properly and waiting for execution.
// It includes time to deliver the task to the remote worker + queueing time
// from the execution side.
WAITING_FOR_EXECUTION = 5;
// The task that is running.
RUNNING = 6;
}
This PR implements ray list tasks and ray list objects APIs.
NOTE: You can ignore the merge conflict for now. It is because the first PR was reverted. There's a fix PR open now.