Task/actor/object summary
Tasks: Group by the func name. In the future, we will also allow to group by task_group.
Actors: Group by actor class name. In the future, we will also allow to group by actor_group.
Object: Group by callsite. In the future, we will allow to group by reference type or task state.
I’d like to propose a bit changes to the API. Currently we are returning the dict of ID -> value mapping when the list API is returned. But I am thinking to change this to a list because the sort will become ineffective if we return the dictionary. So, it’s ideal we use the list to keep the order (it’s important for deterministic order)
Also, for some APIs, each entry doesn’t have a unique id. For example, list objects will have duplicated object IDs from their entries, which is not working with dict return type (e.g., there can be more than 1 Object ID entry if the object is locally referenced & borrowed by task/pinned in memory)
Also, users can easily build dict index on their own if it is necessary.
This PR implements the basic log APIs. For the better APIs (like higher level APIs like ray logs actors), it will be implemented after the internal API review is done.
# If there's only 1 match, print a file content. Otherwise, print all files that match glob.
ray logs [glob_filter] --node-id=[head node by default]
Args:
--tail: Tail the last X lines
--follow: Follow the new logs
--actor-id: The actor id
--pid --node-ip: For worker logs
--node-id: The node id of the log
--interval: When --follow is specified, logs are printed with this interval. (should we remove it?)
This is the PR to implement ray log to the server side. The PR is continued from #24068.
The PR supports two endpoints;
/api/v0/logs # list logs of the node id filtered by the given glob.
/api/v0/logs/{[file | stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id
Some tests need to be re-written, I will do it soon.
As a follow-up after this PR, there will be 2 PRs.
PR to add actual CLI
PR to remove in-memory cached logs and do on-demand query for actor/worker logs
This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs).
We will also support limited types of columns for each API.
The API is as follows
ray list [resources] -- filter [key] [value] => filter data that's key==value.
In the future, we can also support more complicated filtering like !=, And, Or , or etc.