mirror of
https://github.com/vale981/ray
synced 2025-03-05 18:11:42 -05:00
![]() # Why are these changes needed? (map pid=516, ip=172.31.64.223) E0526 12:32:19.203322360 675 chttp2_transport.cc:1103] Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings". See [this](https://github.com/ray-project/ray/issues/25367#issuecomment-1189421372) for more details. We currently see this in many of the large nightly tests. # Root Cause The root cause (with pretty high confidence level) has been some misconfigs between gRPC server/clients. Essentially the client is pinging the server too frequently for keep-alive heartbeats. # Mitigation This PR is merely a mitigation step. I will keep looking into the exact client/server pair later, but probably don't have bandwidth for now largely because the test iteration takes quite a while and verbose logging with gRPC and ray backend have not revealed much useful info. This only kicks in at the end of a long running map phase, and verbose logging doesn't tell me which client is sending the pings. |
||
---|---|---|
.. | ||
mock | ||
ray | ||
shims/windows |