hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Dmitri Gekhtman	f8c74c8de5	[kuberay][docs] Experimental features (#27898 )	2022-08-23 10:06:13 -07:00
Dmitri Gekhtman	17a8db048f	[clusters][docs] Provide urls to content, fix typos (#27936 )	2022-08-23 10:05:28 -07:00
Dmitri Gekhtman	a08ed6ba75	[docs][touch-up] Add ephemeral storage to Ray-on-K8s example. (#27916 )	2022-08-23 10:05:02 -07:00
Cade Daniel	0f262ea98d	Fixing formatting around TODO that found its way into compiled docs. (#28001 ) Signed-off-by: Cade Daniel <cade@anyscale.com> Fixing formatting around TODO that found its way into compiled docs.	2022-08-23 10:04:18 -07:00
Cade Daniel	7561327eed	Small fixes to job submission cluster docs (#28056 ) I walked through the new job submission cluster docs and sanded down a few rough edges. Signed-off-by: Cade Daniel <cade@anyscale.com>	2022-08-23 10:04:18 -07:00
Eric Liang	2e72ef491d	[docs] Add the AIR technical whitepaper to our docs (#28053 ) (#28070 )	2022-08-23 10:02:15 -07:00
Chen Shen	188bd62a09	[2.0] stop building docker-images for 2.0 release branch. as title.	2022-08-23 09:58:34 -07:00
Simon Mo	cba26cc83f	[Serve][Doc] Fix user guide tables (#27991 ) (#27994 ) Co-authored-by: Sihan Wang <sihanwang41@gmail.com>	2022-08-18 11:19:58 -07:00
shrekris-anyscale	d0f4cae6ae	Add RayService link (#27992 ) Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>	2022-08-18 10:52:35 -07:00
Cade Daniel	19c30eed84	Fix linkcheck introduced by Ray Clusters doc changes (#27804 ) Broken links introduced by #27756 Will defer to @ericl if he wants to merge this or fix it himself. Signed-off-by: Cade Daniel <cade@anyscale.com>	2022-08-18 08:57:48 -07:00
scv119	e6067ef04f	fix placement group tests	2022-08-18 07:48:09 -07:00
Edward Oakes	0c02ff948a	Cherry-pick all Serve doc changes for Ray 2.0 (#27960 ) Cherry-picks all docs changes for Serve in Ray 2.0. I did this by overwriting the entire `doc/source/serve/` directory in addition to `doc/source/_toc.yml`. The changes should be isolated to Serve (manually verified).	2022-08-17 23:04:00 -05:00
Simon Mo	aec51c3053	[Serve][Java][Pick] Add Serve to Jar Building Process (#27977 ) So that they are available to be to be downloaded and installed.	2022-08-17 19:19:36 -07:00
matthewdeng	c1e2abf2f0	[2.0.0] update version to 2.0.0 (#27972 ) * [2.0.0] update version to 2.0.0 Signed-off-by: Matthew Deng <matt@anyscale.com>	2022-08-17 17:02:46 -07:00
Chen Shen	79ae3aff4a	[hotfix] Fix pytest dependency in test_utils (#27956 ) import pytest in test_utils breaks a bunch of test.	2022-08-17 12:17:20 -07:00
Antoni Baum	d3c231dad5	[AIR][Docs] Set `logging_strategy="epoch"` for HF (#27917 ) (#27955 )	2022-08-17 09:32:06 -07:00
Kai Fricke	5e4aad9fed	[requirements/docker] Update xgboost-ray and lightgbm-ray versions (#27943 ) Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-17 09:03:37 -07:00
Nikita Vemuri	287dc7002c	[core] Don't override external dashboard URL in internal KV store (#27933 )	2022-08-16 22:59:46 -07:00
Richard Liaw	42686a9c60	[pick] AIR doc changes and benchmark updates (#27924 )	2022-08-16 22:53:28 -07:00
Cheng Su	86d4bf5b0a	Fix nyc_taxi_basic_processing.ipynb end-to-end (#27927 ) Signed-off-by: Cheng Su <scnju13@gmail.com> This is to run ray 2.0.0rc0 on https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html and fix the notebook end-to-end, make sure the output and wording is matched. The page after this PR - https://ray--27927.org.readthedocs.build/en/27927/data/examples/nyc_taxi_basic_processing.html .	2022-08-16 21:30:47 -07:00
Christy Bergman	a012296544	Replace robot image with emoji and replace word Trainer with Algorithm (#27928 )	2022-08-16 21:28:27 -07:00
Yi Cheng	0a8299f9d7	[workflow][doc] First pass of workflow doc. (#27331 ) (#27937 ) Signed-off-by: Yi Cheng 74173148+iycheng@users.noreply.github.com Why are these changes needed? This PR update workflow doc to reflect the recent change. Focusing on position change and others.	2022-08-16 21:26:24 -07:00
Eric Liang	d0891e8b16	[cherry-pick] Simplify Ray start guide and move PI tutorial to examples page (#27930 )	2022-08-16 17:24:19 -07:00
matthewdeng	aba2ddd646	[AIR][Docs] Remove the excessive printing from Torch examples (#27903 ) (#27929 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-08-16 16:15:56 -07:00
Yi Cheng	70207c02e9	[workflow] Documentation of http events (#27166 ) (#27884 ) Documentation updates for the newly introduced HTTPEventProvider and HTTPListener in Ray 2.0. Co-authored-by: Yuan-Chi Chang <84025022+yuanchi2807@users.noreply.github.com>	2022-08-16 14:48:30 -07:00
Cheng Su	7c892092da	Minor fix for nyc_taxi_basic_processing.ipynb (#27886 ) Signed-off-by: Cheng Su <scnju13@gmail.com>	2022-08-16 09:11:16 -07:00
SangBin Cho	83cc9c0e3d	[State Observability] Promote the API to alpha (#27788 ) (#27857 ) * [State Observability] Promote the API to alpha (#27788)	2022-08-15 23:10:42 -07:00
Jiajun Yao	d1a86fc597	Fix broken links in the code (#27873 ) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-15 13:12:52 -07:00
xwjiang2010	7f6578b81e	[release test] increase air tf gpu benchmark non smoke test timeout from 3600 to 4800. (#27869 ) Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>	2022-08-15 10:38:36 -07:00
SangBin Cho	b90867c301	[Test] Fix broken test_base_trainer (#27855 ) The test was written incorrectly. This root cause was that the trainer & worker both requires 1 CPU, meaning pg requires {CPU: 1} * 2 resources. And when the max fraction is 0.001, we only allow up to 1 CPU for pg, so we cannot schedule the requested pgs in any case.	2022-08-15 10:37:42 -07:00
xwjiang2010	b88064dbb6	[release test] remove dask/modin_xgboost test completely. (#27865 ) The original script was removed in https://github.com/ray-project/ray/pull/27816 This is just to clean up some remainings. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>	2022-08-15 08:28:33 -07:00
SangBin Cho	eac8d8f8da	[Link Check] Fix the broken link check from the AIR doc (#27632 ) (#27856 ) Cherry picking #27758 (I mistakenly pushed this to rc1 branch)	2022-08-15 08:07:56 -07:00
SangBin Cho	50547ffb18	[Core][Placement Group] Handling edge cases of max_cpu_fraction argument (#27035 ) Why are these changes needed? This PR fixes the edge cases when the max_cpu_fraction argument is used by the placement group. There was specifically an edge case where the placement group cannot be scheduled when a task or actor is scheduled and occupies the resource. The original logic to decide if the bundle scheduling exceed CPU fraction was as follow. Calculate max_reservable_cpus of the node. Calculate currently_used_cpus + bundle_cpu_request (per bundle) == total_allocation of the node. Don't schedule if total_allocation > max_reservable_cpus for the node. However, the following scenarios caused issues because currently_used_cpus can include resources that are not allocated by placement groups (e.g., actors). As a result, when the actor was already occupying the resource, the total_allocation was incorrect. For example, 4 CPUs 0.999 max fraction (so it can reserve up to 3 CPUs) 1 Actor already created (1 CPU) PG with CPU: 3 Now pg cannot be scheduled because total_allocation == 1 actor (1 CPU) + 3 bundles (3 CPUs) == 4 CPUs > 3 CPUs (max frac CPUs) However, this should work because the pg can use up to 3 CPUs, and we have enough resources. The root cause is that when we calculate the max_fraction, we should only take into account of resources allocated by bundles. To fix this, I change the logic as follow. Calculate max_reservable_cpus of the node. Calculate currently_used_cpus_by_pg_bundles + bundle_cpu_request (sum of all bundles) == total_allocation_from_pgs_and_bundles of the node. Don't schedule if total_allocation_from_pgs_and_bundles > max_reservable_cpus for the node.	2022-08-12 17:40:45 -07:00
matthewdeng	3901e66488	[cherry-pick][data] update datasets API structure (#27836 ) * [data] update datasets API structure (#27592) * [data][docs] fix broken links (#27818)	2022-08-12 17:20:25 -07:00
Simon Mo	795767a231	[Serve] Fix memory leak issue in serve inference (#27815 ) (#27844 ) Co-authored-by: Sihan Wang <sihanwang41@gmail.com>	2022-08-12 17:17:07 -07:00
Balaji Veeramani	79d2cd4499	[AIR] [Docs] Revise "Which preprocessor should you use?" (#27835 ) (#27842 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-12 16:26:24 -07:00
Sven Mika	1e1022d065	[RLlib] CRR framework torch by default. (#27161 )	2022-08-12 15:20:17 -07:00
matthewdeng	00dbc46f3c	[RLlib] pin gym-minigrid @ 1.0.3 (#27761 ) (#27838 ) Co-authored-by: Artur Niederfahrenhorst <artur@anyscale.com>	2022-08-12 13:58:54 -07:00
Balaji Veeramani	4699fb0fd7	[Pick] [AIR] Improve preprocessor documentation (#27809 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-12 13:11:03 -07:00
Eric Liang	e7630c8a0d	[docs] Editing pass on clusters docs, removing legacy material and fixing style issues (#27816 ) (#27817 ) Signed-off-by: Eric Liang <ekhliang@gmail.com>	2022-08-12 09:14:14 -07:00
Yi Cheng	3059eb564f	[serve] Add an internal os env to turn the head node pin off (#27763 ) (#27771 ) When the node id of the controller died, GSC will try to reschedule the controller to the same node. But GCS will only mark the node as failure after 120s when GCS restarts (or 30s if only raylet died). This PR fixed it by unpin it to the head node. So as long as GCS is alive, it'll reschedule it immediately. But we can't turn it on by default, so we introduce an internal flag for this.	2022-08-11 17:59:28 -07:00
Nikita Vemuri	3834a4e359	[core] Fix external dashboard url if connecting to existing Ray cluster (#27808 ) Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com> Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com>	2022-08-11 15:09:57 -07:00
matthewdeng	b793049b0d	[docs] add dask compatibility for 1.13.0 and 2.0.0 (#27699 ) (#27793 ) Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: Matthew Deng <matt@anyscale.com>	2022-08-11 09:32:20 -07:00
Cheng Su	f2d4ac35c9	Replace references of to_tf with iter_tf_batches (#27768 ) [releases/2.0.0][AIR] Replace references of to_tf with iter_tf_batches	2022-08-11 08:44:08 -07:00
Jiajun Yao	ac9fc6efb7	Pin _StatsActor to the driver node (#27765 ) Similar to what's done in #23397 This allows the actor to fate-share with the driver and tolerate worker node failures.	2022-08-10 17:56:05 -07:00
Stephanie Wang	a0278b116b	[core] Reconstruct manually freed objects (#27567 ) Object freed by the manual and internal free call previously would not get reconstructed. This PR introduces the following semantics after a free call: If no failures occurs, and the object is needed by a downstream task, an ObjectFreedError will be thrown. If a failure occurs, causing a downstream task to be re-executed, the freed object will get reconstructed as usual. Also fixes some incidental bugs: Don't crash on failure to contact local raylet during object recovery. This will produce a nicer error message because we will instead throw an application-level error when someone tries to get an object. Fix a circular lock dependency between task failure <> task dependency resolution. Related issue number Closes #27265. Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>	2022-08-10 15:56:20 -07:00
Eric Liang	a8ad637a2c	[docs] Minor tweaks to AIR intro icons (#27751 ) Signed-off-by: Eric Liang <ekhliang@gmail.com>	2022-08-10 10:33:32 -07:00
Jiajun Yao	a8d839ae78	Fix out-of-band deserialization of actor handle (#27700 ) When we deserialize actor handle via pickle, we will register it with an outer object ref equaling to itself which is wrong. For out-of-band deserialization, there should be no outer object ref. Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-09 23:35:47 -07:00
Stephanie Wang	b1781c2e0f	[core] Allow reuse of cluster address if Ray is not running (#27666 ) Signed-off-by: Stephanie Wang swang@cs.berkeley.edu Cluster address is now written to a temp file. Previously we raised an error if ray start --head tried to reuse the old cluster address in the temp file, even if Ray was no longer running. This PR allows ray start --head to continue if it can't find any GCS process associated with the recorded cluster address. Related issue number Closes #27021.	2022-08-09 23:35:41 -07:00
Richard Liaw	644fc814f4	[pick][docs] css improvements (#27704 ) Co-authored-by: Huaiwei Sun <scottsun94@gmail.com>	2022-08-09 12:45:05 -07:00

1 2 3 4 5 ...

13840 commits