This PR is doing 2 things.
(1) Use api_server_url to address which is consistent to other submission APIs.
(2) When the API is not responded timely, it prints a warning every 5 seconds. Below is an example. This is useful when the API is slowly responded (e.g., when there are partial failures). Without this users will see hanging API for 30 seconds, which is a pretty bad UX.
(0.12 / 10 seconds) Waiting for the response from the API server address http://127.0.0.1:8265/api/v0/delay/5.
In Ray 2.0, we want to achieve api server HA.
Originally serve endpoints are in head node.
This pr moves serve endpoints to dashboard agents, so they will be HA due to multiple replica of dashboard agent.
## Why are these changes needed?
This is to refactor the interaction of state cli to API server from a hard-coded request workflow to `SubmissionClient` based.
See #24956 for more details.
## Summary
<!-- Please give a short summary of the change and the problem this solves. -->
- Created a `StateApiClient` that inherits from the `SubmissionClient` and refactor various listing commands into class methods.
## Related issue number
Closes#24956Closes#25578
Some commands in the Serve CLI use Ray client and some commands ping the Ray dashboard; however, all commands read `RAY_ADDRESS` to get the address. This change raises a nice exception if the user accidentally passes a Ray client address as the Ray Dashboard address.
The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.