It seems like the S3 read sometimes fails; #22214. I found out the file actually does exist in S3, so it is highly likely a transient error. This PR adds a retry mechanism to avoid the issue.

It seems like the S3 read sometimes fails; #22214. I found out the file actually does exist in S3, so it is highly likely a transient error. This PR adds a retry mechanism to avoid the issue.
This commit is contained in:
SangBin Cho 2022-02-12 11:58:58 +09:00 committed by GitHub
parent 531e215921
commit 640d92c385
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -53,7 +53,19 @@ def load_dataset(client, data_dir, s3_bucket, nbytes, npartitions):
f"s3://{s3_bucket}/df-{num_bytes_per_partition}-{i}.parquet.gzip"
for i in range(npartitions)
]
df = None
max_retry = 3
retry = 0
while not df and retry < max_retry:
try:
df = dd.read_parquet(filenames)
except FileNotFoundError as e:
print(f"Failed to load a file. {e}")
# Wait a little bit before retrying.
time.sleep(30)
retry += 1
return df