Removes deprecated APIs:
- serve.start()
- get_handle()
Rewrites the ServeHandle doc snippet to use the recommended workflow for ServeHandles (only access them from other deployments, pass Deployments in as input args to `.bind()`, which get resolved to ServeHandles at runtime)
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
First of all, sorry i messed up with the previous pr when sync with the master (#27374). This PR is the duplicate of previous pr until we update the changes (change: adding the version check for the ray_lightning for the compatibility). Also, apology for the massive review requests on the previous PR.
- Currently not all code under ray-core/doc_code is covered by CI.
- tf_example.py and torch_example.py are not used anywhere.
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
This PR
adds a page of guidance on GPU deployment with Ray/K8s. This page is a modified and slightly expanded version of the existing page https://docs.ray.io/en/latest/cluster/kubernetes-gpu.html
moves managed K8s service intro links to their own page
Support a GPU column for the new dashboard
Have first node be default expanded
Signed-off-by: Alan Guo aguo@anyscale.comfixes#13889
Addresses comment from #26996
Converting a Pandas DataFrame column to an ndarray (e.g. via df[col].values) can often result in a full copy of the column in order to construct the ndarray due to Pandas' 2D block management. This PR ports tensor extension type checking to checking the dtype, which is always an O(1) check.
Signed-off-by: Clark Zinzow <clarkzinzow@gmail.com>
We decided to allow escaping the parent pg via `PlacementGroupSchedulingStrategy(placement_group=None)` instead of using "DEFAULT". Our doc is updated with that but in the code it's still not allowed.
Add optional last_activity_at field to /api/component_activities to record end time of most recently finished activity
Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com>
1. Add doc for python SDK and docstrings on public SDK
2. Rename list -> ray_list and get -> ray_get for better naming
3. Fix some typos
4. Auto translate address to api server url.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Ray automatically sets OMP_NUM_THREADS=1, potentially limiting multithreading in native pytorch/tensorflow. If this leads to performance differences, we should address this either in Ray Train or in Ray core.
Signed-off-by: Kai Fricke <kai@anyscale.com>
Why are these changes needed?
Editing pass over the tensor support docs for clarity:
Make heavy use of tabbed guides to condense the content
Rewrite examples to be more organized around creating vs reading tensors
Use doc_code for testing
Signed-off-by: Clarence Ng clarence.wyng@gmail.com
Why are these changes needed?
This PR adds a memory monitor in cpp that runs periodically to check if the node memory usage is above a certain threshold. The caller may provide a callback to the monitor to execute at each interval to determine whether an action should be taken.
This PR is a no-op since the monitor is disabled by default. Another PR based on this will implement the monitor to take action when memory is running low
Why are these changes needed?
The dashboard wasn't working (blank screen). See the linked issue for details. The cause is this exception in /tmp/ray/session_latest/logs/dashboard_agent.log:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/usr/local/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/usr/local/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 167, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 156, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
There was a recent change in prometheus-client which passes the address given to start_http_server to socket.getaddrinfo. This prevents passing in an empty string, but we can get the same effect by passing None.
Related issue number
Closes#23765