hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Eric Liang	ce0ca572b9	[docs] Change data tagline to "Distributed Data Preprocessing" (#27434 ) (#27478 )	2022-08-03 17:27:35 -07:00
Eric Liang	b10cf9027c	[docs] Update colors and styling of ray diagrams (#27474 ) (#27476 )	2022-08-03 16:52:16 -07:00
Eric Liang	c5d51fca25	[docs] Improve the AIR introductory page (#27347 ) (#27472 )	2022-08-03 16:05:03 -07:00
shrekris-anyscale	ba76e4b1a4	[Serve] Make `serve.run()` and `deployment.bind()` beta APIs (#27433 )	2022-08-03 16:02:43 -07:00
Eric Liang	ca7d3285a7	[docs] Revamp README and Ray intro doc page (#27405 ) (#27458 ) This PR revamps and aligns the README and Ray intro doc page: New "What is Ray" diagram that introduces AIR vs Ray core (diagram TBD finalized, this is the working placeholder) Update the description of Ray Link out to the user guides for key libraries and key concepts Remove old / broken links, as well as the inline library descriptions from the README	2022-08-03 14:48:33 -07:00
Alan Guo	8ad147864e	bump jobs version after making a backwards-incompatible change (#27281 ) (#27316 ) Backwards incompatible change was #25902 2.0.0 cherry-pick but not a rc0 blocker Signed-off-by: Alan Guo <aguo@anyscale.com>	2022-08-03 14:18:25 -07:00
Jiajun Yao	fea259593b	[Cherry Pick] Support placement_group=None in PlacementGroupSchedulingStrategy (#27370 ) (#27416 )	2022-08-03 12:14:51 -07:00
Alan Guo	2ad2cb259d	Add GPU info to new dashboard (#27074 ) (#27399 ) Support a GPU column for the new dashboard Have first node be default expanded Signed-off-by: Alan Guo aguo@anyscale.com fixes #13889 Addresses comment from #26996	2022-08-03 11:54:53 -07:00
Simon Mo	f0d7ce9080	[Serve] [Pick] Fix Graph Repeated Invocation (#27417 ) (#27420 )	2022-08-03 10:25:18 -07:00
Jimmy Yao	365446265b	[ray 2.0 release] fix the release test of ray lightning master (#27395 )	2022-08-03 09:40:57 -07:00
Simon Mo	de0c70714f	[Serve] ServeHandle detects ActorError and drop replicas from target group (#26685 )	2022-08-02 23:09:00 -07:00
Yi Cheng	c50b9ac2fa	[workflow] Change step to task in workflow (#27330 ) (#27403 ) Recently we deprecate step in workflow. This PR wrap up everything and replace step to task in workflow to reflect the recent changes.	2022-08-02 17:22:30 -07:00
Simon Mo	64794c88ee	[Serve] Support Multiple DAG Entrypoints in DAGDriver (#26573 ) (#27349 ) Co-authored-by: Sihan Wang <sihanwang41@gmail.com> This is an important feature to prevent regression of feature set when user migrating from 1.0 to 2.0.	2022-08-02 17:17:17 -07:00
Eric Liang	ccc5f44513	[air] Update to beta (#27393 ) (#27407 ) Update API references to beta. Needed as we are going to beta in 2.0. I left out RL/Scikit-Learn/HuggingFace. Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-02 17:12:54 -07:00
Eric Liang	61b2a26035	[air] Fix BatchPredictor.predict_pipelined not working with GPU stage (#27232 ) (#27398 )	2022-08-02 15:40:44 -07:00
Kai Fricke	0d4d4e14a9	[release/tune/2.0.0] Fix k8s release test + node-to-node syncing (#27365 ) * [air] fix xgboost_benchmark script by passing in args (#27146) * [tune/docs] Update custom syncer example (#27252) There is a small bug in the docs example for custom command based syncers. This PR fixes them and adds a test to test these changes. Signed-off-by: Kai Fricke <kai@anyscale.com> * [tune/release] Do not use spot instances in k8s tests (#27250) Spot instances are not being booted up, so let's go without them. Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: matthewdeng <matt@anyscale.com>	2022-08-02 14:46:12 -07:00
Eric Liang	b1f933a17d	[docs] Reorganize the tensor data support docs; general editing (#26952 ) (#27355 ) Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing	2022-08-02 12:39:57 -07:00
Kai Fricke	0fa2806554	[docs/2.0.0] Fix Tune custom syncer example (#27253 ) Co-authored-by: matthewdeng <matt@anyscale.com>	2022-08-01 20:29:04 -07:00
Siyuan (Ryans) Zhuang	990a4534af	[Workflow] Cleanup workflow docs (#27217 ) Signed-off-by: Siyuan Zhuang <suquark@gmail.com>	2022-08-01 18:04:33 -07:00
xwjiang2010	18ec3afdc6	[ air ] clean up some more `tune.run` (#27117 ) (#27321 ) More replacements of tune.run() in examples/docstrings for Tuner.fit() Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-08-01 17:29:42 -07:00
matthewdeng	1a62a8f855	[tune] pin pymoo (#27311 ) Signed-off-by: Matthew Deng <matt@anyscale.com>	2022-07-31 01:16:31 -07:00
Yi Cheng	a3f428e330	[ray2.0][ci] Fix test_gcs_ha_e2e.py (#27263 ) (#27280 ) This PR fix the broken test. The test failed because it's not installing the latest wheel.	2022-07-30 00:12:24 -07:00
Yi Cheng	9391008bc0	[ci] Deflakey gcs_heartbeat_test in windows. (#27275 ) (#27294 ) We need to check the time after acquiring the lock to make sure the correctness. Otherwise, it might wait for the lock and the heartbeat has been updated.	2022-07-30 00:10:53 -07:00
scv119	c0fd69a33b	Revert "[autoscaler] Remove deprecated fields from schema (#27040 ) (#27200 )" This reverts commit `cd1ba2da80`.	2022-07-29 15:37:52 -07:00
Jun Gong	24976ef23a	[RLlib] Revert `41c9ef70`. (#27243 ) (#27270 ) Why are these changes needed? Also: Add validation to make sure multi-gpu and micro-batch is not used together. Update A2C learning test to hit the microbatching branch. Minor comment updates.	2022-07-29 12:02:23 -07:00
Guyang Song	860fe6ccdd	[Hotfix] Fix the failure of C++ tests (#27249 ) (#27260 ) Signed-off-by: 久龙 <guyang.sgy@antfin.com>	2022-07-29 11:59:42 -07:00
xwjiang2010	4bf33efd5c	[air] Add annotation for Tune module. (#27060 ) (#27210 ) Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> As a follow up to #27060.	2022-07-29 11:11:45 -07:00
Eric Liang	59f7a82821	Fix logger initialization (#27238 ) (#27264 ) Cherry-pick #27238 into 2.0 branch.	2022-07-29 11:09:48 -07:00
Jiao	f07f2d8621	[AIR][Data] Fix nyc_taxi_basic_processing notebook (#26983 )	2022-07-29 10:24:25 -07:00
matthewdeng	86718071fe	[tune] Increase volume size for long running pbt failure (#27163 ) (#27247 ) Currently running into an issue: Cluster startup Failed. Error: RuntimeError: botocore.exceptions.ClientError: An error occurred (InvalidBlockDeviceMapping) when calling the RunInstances operation: Volume of size 202GB is smaller than snapshot 'snap-02c4e6a0ad06cf3d6', expect size >= 400GB Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-29 01:16:40 -07:00
Kai Fricke	c680837289	[air/train/release/2.0.0] Rename BaseWorkerMixin, only log info torch loop for rank 0 (#27228 ) Following up from #27098, this PR renames the baseworker mixin and declutters training output by only logging for rank 0 actors. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-07-29 09:04:02 +01:00
Jian Xiao	0916807fb1	Fix the ray version to doc version mapping (#27191 ) Why are these changes needed? It doesn't work if the ray version is something like "2.0.0rc0"	2022-07-28 23:36:52 -07:00
Chen Shen	8b7ccf8502	[CI][hotfix] remove no-index --no-index will not try to install pip packages from pypi. this breaks CI because it failed to find grpcio==1.43.0 as it's missing from cache.	2022-07-28 23:31:24 -07:00
SangBin Cho	1b1787a9aa	[Test] Try fixing a flaky gcs heartbeat manager test. (#27096 ) Heartbeat manager starts its own thread to run its background task and that shares the same data structured used within HandleReportHeartbeat (heartbeats_). That said, both methods should run in the same thread. This achieves it by running HandleReportHeartbeat within the io_service thread	2022-07-28 22:42:49 -07:00
Chen Shen	96f9b9506f	Revert "Allow grpcio >= 1.48 (#26765 )" (#27244 ) This reverts commit `6acd0a4c9b`.	2022-07-28 22:37:41 -07:00
Siyuan (Ryans) Zhuang	f371e17a7f	Fix flaky workflow events CI test by extending timeout (#27231 ) Signed-off-by: Siyuan Zhuang <suquark@gmail.com>	2022-07-28 19:04:34 -07:00
Jimmy Yao	2a0a086ffa	[hot fix] Cherry pick/hot fix 0728 ray lightning (#27225 ) unblock linter	2022-07-28 17:55:26 -07:00
Alex Wu	cd1ba2da80	[autoscaler] Remove deprecated fields from schema (#27040 ) (#27200 ) This change cuts off support for deprecated schema fields. It intentionally breaks backwards compatibility with old configs which set a global min_workers, use head_node or worker_nodes, autoscaling_mode, initial_workers, target_utilization_fraction, and default_worker_node_type fields. Co-authored-by: Alex alex@anyscale.com	2022-07-28 17:09:43 -07:00
Clark Zinzow	f7b46b3ecc	[AIR - Datasets] Fix AIR release tests dealing with tensor columns. (#27221 ) (#27224 ) This PR fixes some AIR release tests that deal with tensor columns.	2022-07-28 16:40:48 -07:00
Guyang Song	950939c7dc	[hotfix] Fix the failure of java test (#27183 ) (#27192 ) Signed-off-by: 久龙 <guyang.sgy@antfin.com>	2022-07-29 07:28:07 +08:00
Yi Cheng	f50729b2ed	[ci] Move test_storage to large test because of windows timeout. #27212 (#27230 ) Windows actually can pass the test, but it'll need > 300s. Move it to large test. Signed-off-by: Yi Cheng <chengyidna@gmail.com>	2022-07-28 15:15:32 -07:00
Kai Fricke	55e9e44a87	[tune/release/2.0.0] Gracefully fail in lstat lookup (#27226 )	2022-07-28 15:15:16 -07:00
Kai Fricke	abca0ba165	[tune/release/2.0.0] Fix tune_cloud_aws_durable_upload_rllib_* release tests (#27180 )	2022-07-28 15:14:49 -07:00
Kai Fricke	24ed249d7c	[air] fix xgboost_benchmark script by passing in args (#27146 ) (#27158 ) Co-authored-by: matthewdeng <matt@anyscale.com>	2022-07-28 15:05:31 -07:00
Alan Guo	6014087505	[Dashboard] Fix node rows not being removed correctly when using filters (#27205 ) (#27223 ) Cherry pick of #27205	2022-07-28 14:43:39 -07:00
Alan Guo	adedfdb0ba	Add back job_id to submit_job API to maintain backwards-compatibility (#27110 ) (#27202 ) Fix for a unintentional backwards-compatibility breakage for #25902 job submit api should still accept job_id as a parameter Signed-off-by: Alan Guo aguo@anyscale.com	2022-07-28 14:27:48 -07:00
Kai Fricke	dc0b445323	[rllib/release/2.0.0] Fix rllib connect test (#27162 ) Why are these changes needed? Follow-up from #27155 - this will let the connect test pass	2022-07-28 14:23:23 -07:00
Clark Zinzow	22ca30cd92	[Cherry-pick] [AIR - Datasets] Hide tensor extension from UDFs. (#27196 )	2022-07-28 13:59:19 -07:00
Chen Shen	94cb7aca29	[Data][Split] Fix split ownership (#27149 ) (#27195 ) `fb54679` introduced a bug by calling ray.put in the remote _split_single_block. This changes the ownership from driver to the worker who runs _split_single_block, which breaks dataset's lineage requirement and failed the chaos test. To fix the issue we need to ensure the split block refs are created by the driver, which we can achieved by creating the block_refs as part of function returns.	2022-07-28 13:02:32 -07:00
Simon Mo	13c3400117	[Serve] Remove release tests for checkpoint_path (#27194 ) (#27206 ) Cherry pick commit `8beb887` to address #27189	2022-07-28 13:00:03 -07:00

1 2 3 4 5 ...

13734 commits