hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Archit Kulkarni	a12c04a2fe	[Serve] [Doc] Update key concepts for 2.0, remove deprecated APIs (#26965 ) Removes deprecated APIs: - serve.start() - get_handle() Rewrites the ServeHandle doc snippet to use the recommended workflow for ServeHandles (only access them from other deployments, pass Deployments in as input args to `.bind()`, which get resolved to ServeHandles at runtime) Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>	2022-08-03 11:27:23 -05:00
Jimmy Yao	1c1cca2736	[release/ray-lightning] adjust the release test of ray lightning master First of all, sorry i messed up with the previous pr when sync with the master (#27374). This PR is the duplicate of previous pr until we update the changes (change: adding the version check for the ray_lightning for the compatibility). Also, apology for the massive review requests on the previous PR.	2022-08-03 16:01:32 +01:00
Kai Fricke	20119c7022	[tune] Fix test_actor_reuse.py::ActorReuseMultiTest test (#27427 ) Increase time to allow for scheduling latency Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-03 13:54:11 +01:00
Kai Fricke	46ed3557ba	[tune] Fix test_resource_exhausted_info test (#27426 ) #27213 broke this test Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-03 13:53:46 +01:00
Simon Mo	4e07019b88	[Serve] Fix Graph Repeated Invocation (#27417 )	2022-08-03 01:40:19 -07:00
shrekris-anyscale	adc7c4dc87	[Serve] Make `serve.run()` and `deployment.bind()` beta APIs (#27401 )	2022-08-02 23:11:23 -07:00
Simon Mo	8ac6d02502	[Serve][Nightly] Environment for Nightly K8s Tests (#27126 )	2022-08-02 23:05:47 -07:00
Jiajun Yao	8b7e4ac701	[Doc] Test ray core doc code (#27334 ) - Currently not all code under ray-core/doc_code is covered by CI. - tf_example.py and torch_example.py are not used anywhere. Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-02 20:51:47 -07:00
Simon Mo	6084eb6a9f	Revert "Revert "[Serve] ServeHandle detects ActorError and drop replicas from target group (#26685 )" (#27283 )" (#27348 )	2022-08-02 20:04:03 -07:00
Rohan Potdar	5b6a58ed28	[RLlib] Add OPE Learning Tests (#27154 )	2022-08-02 17:51:38 -07:00
Richard Liaw	6dc3dbdd37	[air] Update to beta (#27393 ) Update API references to beta. Needed as we are going to beta in 2.0. I left out RL/Scikit-Learn/HuggingFace.	2022-08-02 17:10:41 -07:00
Dmitri Gekhtman	4d87e8112a	[docs][kubernetes] GPU user guide (#27360 ) Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com> This PR adds a page of guidance on GPU deployment with Ray/K8s. This page is a modified and slightly expanded version of the existing page https://docs.ray.io/en/latest/cluster/kubernetes-gpu.html moves managed K8s service intro links to their own page	2022-08-02 15:58:23 -07:00
Eric Liang	91a03026ef	[air] Fix BatchPredictor.predict_pipelined not working with GPU stage (#27232 )	2022-08-02 15:36:40 -07:00
Alan Guo	c083ca5871	Add GPU info to new dashboard (#27074 ) Support a GPU column for the new dashboard Have first node be default expanded Signed-off-by: Alan Guo aguo@anyscale.com fixes #13889 Addresses comment from #26996	2022-08-02 15:32:55 -07:00
Clark Zinzow	291a294208	[AIR - Serve] [Hotfix] Check for tensor extension via dtype rather than a NumPy conversion (#26891 ) Converting a Pandas DataFrame column to an ndarray (e.g. via df[col].values) can often result in a full copy of the column in order to construct the ndarray due to Pandas' 2D block management. This PR ports tensor extension type checking to checking the dtype, which is always an O(1) check. Signed-off-by: Clark Zinzow <clarkzinzow@gmail.com>	2022-08-02 14:52:46 -07:00
Avnish Narayan	00f9438101	[RLlib] Training step docs. (#27344 )	2022-08-02 23:41:45 +02:00
Ricky Xu	122eda2757	[Core] Move test_state_api test back to large test groups (#27377 ) Why are these changes needed? python/tests/test_state_api.py runs for 5min in normal run	2022-08-02 14:21:34 -07:00
Eric Liang	6384734071	[docs] Adjust the set of global doc owners to those responsible for copy-editing Signed-off-by: Eric Liang <ekhliang@gmail.com>	2022-08-02 14:09:21 -07:00
Archit Kulkarni	e02b072939	[Doc] [Serve] Edit grammar/usage/organization for HTTP adapters page (#26969 ) Moves FastAPI into its own section instead of appearing in a duplicated note. Co-authored-by: simon-mo <simon.mo@hey.com>	2022-08-02 15:08:05 -05:00
Simon Mo	a9d94f740c	[Serve] Remove the warning for async handles in 2.0 (#27346 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2022-08-02 15:07:41 -05:00
Richard Liaw	c8561071f3	[air/train/docs] gbdt trainer user guide (#27362 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-08-02 13:02:42 -07:00
clarng	84674fa868	[docs] ray core namespace docs: edit pass & move python code into doc_code dir (#27341 )	2022-08-02 12:52:30 -07:00
clarng	34385b8136	[docs] ray core cross-lang docs: edit pass & move python code into doc_code dir (#27350 ) Edit pass. Move code into doc_code dir. Code in doc_code is verified by CI	2022-08-02 12:50:05 -07:00
Jiajun Yao	cd2e590567	Support placement_group=None in PlacementGroupSchedulingStrategy (#27370 ) We decided to allow escaping the parent pg via `PlacementGroupSchedulingStrategy(placement_group=None)` instead of using "DEFAULT". Our doc is updated with that but in the code it's still not allowed.	2022-08-02 12:49:41 -07:00
Eric Liang	a1cb735035	Raise the (runtime_env max size) gRPC max message size to 500MiB Signed-off-by: Eric Liang <ekhliang@gmail.com>	2022-08-02 12:41:34 -07:00
Jun Gong	61add8ede6	[RLlib] Fix the last cartpole-crashing premerge test. (#27315 )	2022-08-02 20:08:33 +02:00
Nikita Vemuri	9a0b9918e5	[dashboard] Add `last_activity_at` field to `/api/component_activities` (#27284 ) Add optional last_activity_at field to /api/component_activities to record end time of most recently finished activity Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com>	2022-08-02 11:02:15 -07:00
kourosh hakhamaneshi	bda5026428	[RLlib] Fix A2C release tests (#27314 )	2022-08-02 10:44:52 -07:00
kourosh hakhamaneshi	8d848890f1	[RLlib] Fix default view_requirement in policy.py (#27255 )	2022-08-02 10:44:07 -07:00
Ricky Xu	82a24f9319	[Doc][Core][State Observability] Adding Python SDK doc and docstring (#26997 ) 1. Add doc for python SDK and docstrings on public SDK 2. Rename list -> ray_list and get -> ray_get for better naming 3. Fix some typos 4. Auto translate address to api server url. Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2022-08-02 11:24:59 -05:00
Kai Fricke	d527c7b335	[air/benchmarks] Drop OMP_NUM_THREADS in vanilla torch/tf training (#27256 ) Ray automatically sets OMP_NUM_THREADS=1, potentially limiting multithreading in native pytorch/tensorflow. If this leads to performance differences, we should address this either in Ray Train or in Ray core. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-02 13:38:01 +01:00
xwjiang2010	36cf1baa82	[air doc] checkpoint_freq --> checkpoint_frequency (#27325 ) Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>	2022-08-02 11:34:10 +01:00
Kai Fricke	149c031c4b	[tune/release] Do not use spot instances in k8s tests (#27250 ) Spot instances are not being booted up, so let's go without them. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-02 11:30:41 +01:00
Yi Cheng	a9697722cf	[workflow] Change `step` to `task` in workflow. (#27330 ) * change step to task Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com> * fix comments Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com> * fix comments Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com> * fix comments Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2022-08-01 22:27:41 -07:00
Jules S. Damji	4045ba4841	[DOC Ray AIR] minor editorial tweaks for clarity and usage (#27128 ) Co-authored-by: Jules Damji <jules@anyscale.com>	2022-08-01 21:09:04 -07:00
Dmitri Gekhtman	6efca71c35	[docs][kubernetes] XGBoost ML example (#27313 ) Adds a guide on running an XGBoost-Ray workload using KubeRay. Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>	2022-08-01 19:30:41 -07:00
Yi Cheng	00d22b6c7c	[core] Fix the test_failure_3.py in win (#27332 ) Win tests were broken because when the child is killed, the parent is also killed. Change the signal sent and make it work.	2022-08-01 18:55:07 -07:00
shrekris-anyscale	324d8e4bca	[Serve] Serialize `user_config` with JSON instead of Pickle (#26235 )	2022-08-01 17:53:43 -07:00
Eric Liang	f7ae8923f6	[docs] Reorganize the tensor data support docs; general editing (#26952 ) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing	2022-08-01 17:31:41 -07:00
Jiajun Yao	c50faa126c	Replace boost::filesystem with std::filesystem (#27338 ) Redo #27319 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-01 17:12:23 -07:00
clarng	fffcae1cb4	[docs] ray core dag docs: edit pass & move code into separate dir (#27318 )	2022-08-01 17:05:36 -07:00
shrekris-anyscale	cc84953da3	[Serve] [Docs] Update "Getting Started" documentation (#26745 )	2022-08-01 16:31:48 -07:00
Jiajun Yao	36d5e5f99d	Revert "Replace boost::filesystem with std::filesystem (#27319 )" (#27337 ) This reverts commit `8e5c51d7d7`.	2022-08-01 13:46:45 -07:00
Jiajun Yao	8e5c51d7d7	Replace boost::filesystem with std::filesystem (#27319 ) std::filesystem is shipped with c++17, there is no need to depend on boost for this. Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-01 11:44:39 -07:00
xwjiang2010	c9579fea1c	[air] update pytorch_training_e2e.py to use iter_torch_batches. (#27241 ) update pytorch_training_e2e.py to use iter_torch_batches. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>	2022-08-01 19:23:01 +01:00
clarng	57adde3f7d	memory monitor (#27017 ) Signed-off-by: Clarence Ng clarence.wyng@gmail.com Why are these changes needed? This PR adds a memory monitor in cpp that runs periodically to check if the node memory usage is above a certain threshold. The caller may provide a callback to the monitor to execute at each interval to determine whether an action should be taken. This PR is a no-op since the monitor is disabled by default. Another PR based on this will implement the monitor to take action when memory is running low	2022-08-01 10:40:46 -07:00
jonathan-conder-sm	1d5fef2004	Fix dashboard with prometheus-client 0.14 (#23766 ) Why are these changes needed? The dashboard wasn't working (blank screen). See the linked issue for details. The cause is this exception in /tmp/ray/session_latest/logs/dashboard_agent.log: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module> loop.run_until_complete(agent.run()) File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run modules = self._load_modules() File "/usr/local/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules c = cls(self) File "/usr/local/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__ self._metrics_agent = MetricsAgent( File "/usr/local/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__ prometheus_exporter.new_stats_exporter( File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter exporter = PrometheusStatsExporter( File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__ self.serve_http() File "/usr/local/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http start_http_server( File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 167, in start_wsgi_server TmpServer.address_family, addr = _get_best_family(addr, port) File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 156, in _get_best_family infos = socket.getaddrinfo(address, port) File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known There was a recent change in prometheus-client which passes the address given to start_http_server to socket.getaddrinfo. This prevents passing in an empty string, but we can get the same effect by passing None. Related issue number Closes #23765	2022-08-01 10:25:38 -07:00
Sihan Wang	410fe1b5ec	[Serve] Support Multiple DAG Entrypoints in DAGDriver (#26573 )	2022-08-01 09:16:36 -07:00
Artur Niederfahrenhorst	a598458c46	[RLlib] Fix complex torch one-hot and flattened layers not being added to module list. (#27304 )	2022-08-01 15:52:28 +02:00
Steven Morad	d0a8e3c36f	[RLlib] User-friendly RNN sequencing. (#27087 )	2022-08-01 15:32:22 +02:00

... 3 4 5 6 7 ...

13994 commits