## Why are these changes needed?
Prior to this PR, we have:
```cpp
class XxxAccessor {}
class ServiceBasedXxxAccessor : public XxxAccessor{}
class GcsClient {}
class ServiceBasedGcsClient : public GcsClient{}
```
However, XxxAccessor has only one implementation: ServiceBasedXxxAccessor. And GcsClient has only one implementation: ServiceBasedGcsClient.
I think this abstraction is not necessary and will make development hard(I have to modify two files every time).
This PR removes all ServiceBasedXxx and moves its implementations to the base class.
Now we only have:
```cpp
class XxxAccessor {}
class GcsClient {}
```
This PR adds more infrastructure for subscribing to GCS via ray::pubsub instead of Redis.
Most important logic added are
GCS subscriber RPC interface in src/ray/protobuf/gcs_service.proto
GCS subscriber handler in src/ray/gcs/gcs_server/pubsub_handler.{h,cc}
GCS wrapper for ray::pubsub subscriber in src/ray/gcs/pubsub/gcs_pub_sub.{h,cc}
Other files are modified for adding boilerplates, plumbing, removing dead code and cleanups.
This PR can also be reviewed commit by commit. 418f065, 3279430 are cleanups. 028939c is a pure-refactoring of how GCS clients subscribe to GCS updates that should not change behavior yet, similar to [Pubsub] Wrap Redis-based publisher in GCS to allow incrementally switching to the GCS-based publisher #19600. 286161f parameterized gcs_server_test to test GCS pubsub. The rest of commits have new logic added.
All new logic are behind the gcs_grpc_based_pubsub flag, so this PR should not affect Ray's default behavior.
The added subscriber logic was tested by enabling gcs_grpc_based_pubsub in service_based_gcs_client_test.cc and adding basic handling logic for TaskLease. Since TaskLease pubsub will be removed, the change will not be checked in.
Next step is to support SubscribeAll entities for a channel in ray::pubsub, and test migrating more channels.
## Why are these changes needed?
When ray spill back, it'll check whether the node exists or not through gcs, so there is a race condition and sometimes raylet crashes due to this.
This PR filter out the node that's not available when select the node.
## Related issue number
#19438
## Why are these changes needed?
The most significant change of the PR is the `GcsPublisher` wrapper added to `src/ray/gcs/pubsub/gcs_pub_sub.h`. It forwards publishing to the underlying `GcsPubSub` (Redis-based) or `pubsub::Publisher` (GCS-based) depending on the migration status, so it allows incremental migration by channel.
- Since it was decided that we want to use typed ID and messages for GCS-based publishing, each member function of `GcsPublisher` accepts a typed message.
Most of the modified files are from migrating publishing logic in GCS to use `GcsPublisher` instead of `GcsPubSub`.
Later on, `GcsPublisher` member functions will be migrated to use GCS-based publishing.
This change should make no functionality difference. If this looks ok, a similar change would be made for subscribers in GCS client.
## Related issue number
## Why are these changes needed?
There are some issues left from previous PRs.
- Put the gcs_actor_scheduler_mock_test back
- Add comment for named actor creation behavior
- Fix the comment for some flags.
## Related issue number
* exp backoff
* up
* format
* up
* up
* up
* up
* up
* format
* fix
* up
* format
* adjust ordering
* up
* Revert "[tune] Cache unstaged placement groups for potential re-use (#18706)"
This reverts commit 2e99fb215f.
* up
* update
* format
* up
* format
* fix
* Revert "Revert "[tune] Cache unstaged placement groups for potential re-use (#18706)""
This reverts commit 93425fdb986059e53699623a0fc8590c062e139b.
* up
* format
* fix lint
* up
* up
* up
* up
* check
* add test1
* format
* up
* add test
* up
* up
* up
* fix
* up
* up
* up
* add test
* format
* up
* up
* fix lint
* format
* fix
* format
* fix
* up
* clang-tidy
* fix
* fix script
* test clang compiler
* fix clang-tidy rules
* Fix windows and other issues.
* Fix
* Improve information when running check-git-clang-tidy-output.sh on different OS
* Streaming support metric reporter
* fix lint
* fix bazel format lint
* fix lint
* metric deps lint
* lint
* and comments for runtime reporter
* unordered_map instead
* comments
* fix visibility flag
* deps local .so target
* make stats public visibility
* stats lib in public
* add antgroup team tag
* begin
* build
* add test
* add first test
* add test
* fix build
* lint bazel
* fix build
* fix build
* fix crash
* fix some comment
* revert shared_ptr ObjectLifecycleManager
* fix RemoveGetRequest lost
* no defer
* fix lots of comments
* fix build
* fix data race
* fix comments
* Revert "fix data race"
This reverts commit 8f58e3d70b73af864566e056211ff1b70cab870c.
* refine
* fix mac build
* fix unit test
* fix unit test
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* checkpoint
* up
* up
* up
* up
* fix
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* add comments
* up
* up
* up
* up
* add tests