hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-04 17:41:43 -05:00

Author	SHA1	Message	Date
Alan Guo	326b5bd1ac	Convert job_manager to be async (#27123 ) Updates jobs api Updates snapshot api Updates state api Increases jobs api version to 2 Signed-off-by: Alan Guo aguo@anyscale.com Why are these changes needed? follow-up for #25902 (comment)	2022-08-05 19:33:49 -07:00
SangBin Cho	028684032b	[State Observability] Add warnings for data truncation + order columns as it is defined in `StateSchema` (#27018 ) # Why are these changes needed? This PR does 3 things Add warnings for data truncation (which is a follow-up) Improve some of confusing warning messages order columns as it is defined in StateSchema (so that we can customize the column order for better usability). I did this only for list because i thought it wasn't that important for summary, but I might be wrong	2022-07-27 06:56:30 -07:00
SangBin Cho	39b9c44c8d	[State Observability] pre-alpha documentation (#26560 ) Adds Documentation for state APIs API reference	2022-07-26 05:49:28 -07:00
Ricky Xu	259473c221	[Core][State Observability] Truncate warning message is incorrect when filter is used (#26801 ) Signed-off-by: rickyyx rickyx@anyscale.com # Why are these changes needed? When we returned less/incomplete results to users, there could be 3 reasons: Data being truncated at the data source (raylets -> API server) Data being filtered at the API server Data being limited at the API server We are not distinguishing the those 3 scenarios, but we should. This is why we thought data being truncated when it's actually filtered/limited. This PR distinguishes these scenarios and prompt warnings accordingly. # Related issue number Closes #26570 Closes #26923	2022-07-25 23:31:49 -07:00
SangBin Cho	15b711ae6a	[State Observability] Warn if callsite is disabled when `ray list objects` + raise exception on missing output (#26880 ) This PR does 3 things. 1. Warn if callsite is disabled when `ray list objects` and `ray summary objects` 2. Decode owner_id for ray list actors 3. Support raise_on_missing_output	2022-07-24 19:55:36 -07:00
SangBin Cho	37f4692aa8	[State Observability] Fix "No result for get crashing the formatting" and "Filtering not handled properly when key missing in the datum" #26881 Fix two issues No result for get crashing the formatting Filtering not handled properly when key missing in the datum	2022-07-23 21:33:07 -07:00
Ricky Xu	6ee37d4ad7	[Core][State Observability] Fix is_alive column with wrong column type that breaks filtering (#26739 ) is_alive column of the WorkerState has wrong column type that breaks filtering on is_alive	2022-07-20 16:38:15 -07:00
SangBin Cho	adf24bfa97	[State Observability] Use a table format by default (#26159 ) NOTE: tabulate is copied/pasted to the codebase for table formatting. This PR changes the default layout to be the table format for both summary and list APIs.	2022-07-19 00:54:16 -07:00
SangBin Cho	e9f6ffc5a5	[Core][State Observability] Use address arg + print warning if API responds slowly (#26008 ) This PR is doing 2 things. (1) Use api_server_url to address which is consistent to other submission APIs. (2) When the API is not responded timely, it prints a warning every 5 seconds. Below is an example. This is useful when the API is slowly responded (e.g., when there are partial failures). Without this users will see hanging API for 30 seconds, which is a pretty bad UX. (0.12 / 10 seconds) Waiting for the response from the API server address http://127.0.0.1:8265/api/v0/delay/5.	2022-07-14 06:44:07 -07:00
SangBin Cho	8837a4593f	[State Observability] Truncate data when there are too many entries to return (#26124 ) ## Why are these changes needed? This PR adds data truncation when there are more than N number of entries. The policy is as follow; By default, we return 100 entries at max. Users can adjust this value, but we won't allow to increase more than 10K. By default, all internal RPCs truncate data if it's > 10K. For distributed sources, we query each source with 10K limit and we apply limit again at the end. ## Related issue number Closes https://github.com/ray-project/ray/issues/25984#issue-1279280673 Part of https://github.com/ray-project/ray/issues/25718#issue-1268968400	2022-06-28 18:33:57 -07:00
SangBin Cho	68336abf13	[State Observability] Support --detail flag. (#26071 ) ## Why are these changes needed? This PR adds --detail flag to the list APIs.	2022-06-28 07:56:44 -07:00
SangBin Cho	4b957e99b5	[State Observability] != predicate for filtering. (#26079 ) ## Why are these changes needed? This PR implements `!=` predicate for filtering. As a result of this PR, two APIs are changed. ``` --filter key value -> --filter "key=val" or ---filter "key!=val" list_actors(filters=[(key, val), (key2, val2)]) -> list_actors(filters=[(key, "=", val), (key2, "=", val2)]) ```	2022-06-28 05:42:19 -07:00
SangBin Cho	6552e096e6	[State Observability] Summary APIs (#25672 ) Task/actor/object summary Tasks: Group by the func name. In the future, we will also allow to group by task_group. Actors: Group by actor class name. In the future, we will also allow to group by actor_group. Object: Group by callsite. In the future, we will allow to group by reference type or task state.	2022-06-22 06:21:50 -07:00
SangBin Cho	411b1d8d2d	[State Observability] Return list instead of dict (#25888 ) I’d like to propose a bit changes to the API. Currently we are returning the dict of ID -> value mapping when the list API is returned. But I am thinking to change this to a list because the sort will become ineffective if we return the dictionary. So, it’s ideal we use the list to keep the order (it’s important for deterministic order) Also, for some APIs, each entry doesn’t have a unique id. For example, list objects will have duplicated object IDs from their entries, which is not working with dict return type (e.g., there can be more than 1 Object ID entry if the object is locally referenced & borrowed by task/pinned in memory) Also, users can easily build dict index on their own if it is necessary.	2022-06-20 22:49:29 -07:00
SangBin Cho	856bea31fb	[State Observability] Ray log CLI / API (#25481 ) This PR implements the basic log APIs. For the better APIs (like higher level APIs like ray logs actors), it will be implemented after the internal API review is done. # If there's only 1 match, print a file content. Otherwise, print all files that match glob. ray logs [glob_filter] --node-id=[head node by default] Args: --tail: Tail the last X lines --follow: Follow the new logs --actor-id: The actor id --pid --node-ip: For worker logs --node-id: The node id of the log --interval: When --follow is specified, logs are printed with this interval. (should we remove it?)	2022-06-13 05:52:57 -07:00
SangBin Cho	54496d7705	[State Observability API] Support Filtering (#25281 ) This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs). We will also support limited types of columns for each API. The API is as follows ray list [resources] -- filter [key] [value] => filter data that's key==value. In the future, we can also support more complicated filtering like !=, And, Or , or etc.	2022-06-03 17:17:30 -07:00
SangBin Cho	a7e759317b	[State Observability API] Error handling (#24413 ) This improves error handling per https://docs.google.com/document/d/1IeEsJOiurg-zctOcBjY-tQVbsCmURFSnUCTkx_4a7Cw/edit#heading=h.pdzl9cil9e8z (the RPC part). Semantics If all queries to the source failed, raise a RayStateApiException. If partial queries are failed, warnings.warn the partial failure when print_api_stats=True. It is true for CLI. It is false when it is used within Python API or json / yaml format is required.	2022-05-24 03:56:49 -07:00
SangBin Cho	ec653e3196	[Nightly test] Move two line downloads to one line. (#25061 ) It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later	2022-05-22 00:07:03 -07:00
SangBin Cho	b9c30529d8	[Core/Observability 1/N] Add a "running" state to task status (#24651 ) This PR adds 2 more states into TaskStatus enum TaskStatus { // The task is scheduled properly and waiting for execution. // It includes time to deliver the task to the remote worker + queueing time // from the execution side. WAITING_FOR_EXECUTION = 5; // The task that is running. RUNNING = 6; }	2022-05-16 05:39:05 -07:00
SangBin Cho	2bce07d4ce	[State API] List runtime env API (#24126 ) This PR supports list runtime env API	2022-05-02 14:01:00 -07:00
SangBin Cho	73ed67e9e6	[State API] State api limit + Removing unnecessary modules (#24098 ) This PR does Move all routes into the same module, state_head.py Support a limit feature.	2022-04-22 15:59:46 -07:00
SangBin Cho	30ab5458a7	[State Observability] Tasks and Objects API (#23912 ) This PR implements ray list tasks and ray list objects APIs. NOTE: You can ignore the merge conflict for now. It is because the first PR was reverted. There's a fix PR open now.	2022-04-21 18:45:03 -07:00
SangBin Cho	1c3329fa38	Revert "Revert "[State Observability] Basic functionality for central… (#23933 ) …ized data (#23744)" (#23918)" This reverts commit `fb14e82`.	2022-04-18 21:15:43 -07:00
Amog Kamsetty	fb14e82242	Revert "[State Observability] Basic functionality for centralized data (#23744 )" (#23918 ) This reverts commit `51a4a1a802`. breaking tune multinode tests and kuberay:test_autoscaling_e2e	2022-04-14 14:28:42 -07:00
SangBin Cho	51a4a1a802	[State Observability] Basic functionality for centralized data (#23744 ) Support listing actor/pg/job/node/workers Design doc: https://docs.google.com/document/d/1IeEsJOiurg-zctOcBjY-tQVbsCmURFSnUCTkx_4a7Cw/edit#heading=h.9ub9e6yvu9p2 Note that this PR doesn't contain any output except ids. I will update them in the follow-up PRs.	2022-04-14 07:33:18 -07:00

25 commits