[gcs] Fix internal kv as the bottleneck when worker starts (#20662)

## Why are these changes needed? Before the commit (e54d3117a4) all traffics go to redis which is a dedicated service. After moving to gcs, internal kv are competing with gcs traffic which make it a bottleneck sometimes. Before this PR, `many_actor` tests are failing, the reason is that when a lot of actors starts, gcs is really heavy loads, and then worker starts timeout because it failed to get internal kv requests executed in short time. When worker failed, it'll starts a new worker even the original one is pending, and in the end there will be a lot workers. There are several things here need to fix and this is the quick fix for this issues which also convert it back to the status when we are using redis. ## Related issue number Closes #20602
2025-03-06 10:31:39 -05:00 · 2021-11-23 15:13:07 -08:00 · 2021-11-23 15:13:07 -08:00 · 40db73c2ff
commit 40db73c2ff
parent e3e9697bea
1 changed files with 2 additions and 3 deletions
--- a/src/ray/rpc/gcs_server/gcs_rpc_server.h
+++ b/src/ray/rpc/gcs_server/gcs_rpc_server.h
@ -62,9 +62,8 @@ namespace rpc {
  RPC_SERVICE_HANDLER(PlacementGroupInfoGcsService, HANDLER, \
                      RayConfig::instance().gcs_max_active_rpcs_per_handler())

-#define INTERNAL_KV_SERVICE_RPC_HANDLER(HANDLER)     \
-  RPC_SERVICE_HANDLER(InternalKVGcsService, HANDLER, \
-                      RayConfig::instance().gcs_max_active_rpcs_per_handler())
+#define INTERNAL_KV_SERVICE_RPC_HANDLER(HANDLER) \
+  RPC_SERVICE_HANDLER(InternalKVGcsService, HANDLER, -1)

 // Unlimited max active RPCs, because of long poll.
 #define INTERNAL_PUBSUB_SERVICE_RPC_HANDLER(HANDLER) \