[ray client] enable ray.get with >2 sec timeout (#21883) (#22165)

Commit 2cf4c72 ("[ray client] Fix ctrl-c for ray.get() by setting a
short-server side timeout") introduced a short server-side timeout not
to block later operations.

However, the fix implicitly assumes that get() is complete within
MAX_BLOCKING_OPERATION_TIME_S (two seconds). This becomes a problem
when apps use heavy objects or limited network I/O bandwidth that
require more than two seconds to push all chunks. The current retry
logic needs to re-push from the beginning of chunks and block clients
with the infinite re-push.

I updated the logic to directly pass timeout if it is explicitly given.
Without timeout, it still uses MAX_BLOCKING_OPERATION_TIME_S for
polling with the short server-side timeout.
This commit is contained in:
Takeshi Yoshimura 2022-04-26 05:06:52 +09:00 committed by GitHub
parent c73f02ded5
commit e115545579
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -421,14 +421,19 @@ class Worker:
else:
deadline = time.monotonic() + timeout
max_blocking_operation_time = MAX_BLOCKING_OPERATION_TIME_S
if "RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S" in os.environ:
max_blocking_operation_time = float(
os.environ["RAY_CLIENT_MAX_BLOCKING_OPERATION_TIME_S"]
)
while True:
if deadline:
op_timeout = min(
MAX_BLOCKING_OPERATION_TIME_S,
max_blocking_operation_time,
max(deadline - time.monotonic(), 0.001),
)
else:
op_timeout = MAX_BLOCKING_OPERATION_TIME_S
op_timeout = max_blocking_operation_time
try:
res = self._get(to_get, op_timeout)
break