ray/src
Ricky Xu 68b5d4302c
[Core] Suppress gRPC server alerting on too many keep-alive pings (#27769)
# Why are these changes needed?
(map pid=516, ip=172.31.64.223) E0526 12:32:19.203322360     675 chttp2_transport.cc:1103]   Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings". See [this](https://github.com/ray-project/ray/issues/25367#issuecomment-1189421372) for more details. 
We currently see this in many of the large nightly tests.

# Root Cause
The root cause (with pretty high confidence level) has been some misconfigs between gRPC server/clients. Essentially the client is pinging the server too frequently for keep-alive heartbeats.

# Mitigation
This PR is merely a mitigation step. I will keep looking into the exact client/server pair later, but probably don't have bandwidth for now largely because the test iteration takes quite a while and verbose logging with gRPC and ray backend have not revealed much useful info. This only kicks in at the end of a long running map phase, and verbose logging doesn't tell me which client is sending the pings.
2022-08-17 01:53:47 -07:00
..
mock [State Observability] Summary APIs (#25672) 2022-06-22 06:21:50 -07:00
ray [Core] Suppress gRPC server alerting on too many keep-alive pings (#27769) 2022-08-17 01:53:47 -07:00
shims/windows Support copyright format for c++ files (#14348) 2021-08-04 17:19:38 +08:00