[ray client] enable ray.get with >2 sec timeout (#21883) (#22165)

Commit 2cf4c72 ("[ray client] Fix ctrl-c for ray.get() by setting a short-server side timeout") introduced a short server-side timeout not to block later operations. However, the fix implicitly assumes that get() is complete within MAX_BLOCKING_OPERATION_TIME_S (two seconds). This becomes a problem when apps use heavy objects or limited network I/O bandwidth that require more than two seconds to push all chunks. The current retry logic needs to re-push from the beginning of chunks and block clients with the infinite re-push. I updated the logic to directly pass timeout if it is explicitly given. Without timeout, it still uses MAX_BLOCKING_OPERATION_TIME_S for polling with the short server-side timeout.
2025-03-06 02:21:39 -05:00 · 2022-04-26 05:06:52 +09:00 · 2022-04-26 05:06:52 +09:00 · e115545579
commit e115545579
parent c73f02ded5
1 changed files with 7 additions and 2 deletions
--- a/python/ray/util/client/worker.py
+++ b/python/ray/util/client/worker.py
@ -421,14 +421,19 @@ class Worker:
        else:
            deadline = time.monotonic() + timeout

+        max_blocking_operation_time = MAX_BLOCKING_OPERATION_TIME_S
+        if "RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S" in os.environ:
+            max_blocking_operation_time = float(
+                os.environ["RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S"]
+            )
        while True:
            if deadline:
                op_timeout = min(
-                    MAX_BLOCKING_OPERATION_TIME_S,
+                    max_blocking_operation_time,
                    max(deadline - time.monotonic(), 0.001),
                )
            else:
-                op_timeout = MAX_BLOCKING_OPERATION_TIME_S
+                op_timeout = max_blocking_operation_time
            try:
                res = self._get(to_get, op_timeout)
                break