hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
mwtian	aad6f41593	[Tune] Remove unused autogluon requirement (#16587 ) `autogluon` does not support Python 3.9. And Ray seems to not import it anywhere.	2022-03-11 16:54:23 -08:00
SangBin Cho	97383e4c1b	[Nightly test] Fix a broken nightly test due to the wrong config (#23097 )	2022-03-11 16:47:06 -08:00
Amog Kamsetty	2294a7ed47	[ml] `TorchPredictor` interface (#22990 )	2022-03-11 16:00:53 -08:00
Siyuan (Ryans) Zhuang	be7ccb7dac	[core][serialization] Fix registering serializer before initializing Ray. (#23031 ) * Support registering serializer before initializing Ray. * add test	2022-03-11 15:13:18 -08:00
Yi Cheng	4f86b5b523	[gcs] Remove `use_gcs_for_bootstrap` in core (python) and autoscaler (#23050 ) This is part of cleanup PR for Redisless Ray. This PR remove use_gcs_for_bootstrap in core and autoscaler.	2022-03-11 14:36:16 -08:00
Peng Yu	252ba6cecd	Correct documentation in ActorPoolStrategy (#23079 )	2022-03-11 13:27:55 -08:00
Simon Mo	2f2fc97bd1	Don't symlink Serve in setup-dev (#23092 )	2022-03-11 13:21:00 -08:00
Jeroen Bédorf	bc21a4593d	[RLlib] Fix crash when kl_coeff is set to 0 (#23063 ) Co-authored-by: Jeroen Bédorf <jeroen@minds.ai> Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com> Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>	2022-03-11 12:24:52 -08:00
Jian Xiao	e9ae784e62	Make schema() read non-disruptive to iter_datasets() (#23032 ) Currently, reading schema of DatasetPipeline is disruptive and will invalidate the iter_datasets().	2022-03-11 12:01:24 -08:00
Patrick Ames	1d48c8dc75	[Datasets] Support dataset metadata provider callbacks in read APIs. (#22896 ) These changes add Dataset Read API support for (1) specifying custom block metadata provider callbacks, and (2) skipping path expansion. When paired with a custom block metadata provider that maintains an in-memory cache of BlockMetadata for each input file path, these changes reduced average S3-based dataset read times for production [Redshift Manifests](https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html) stored in Amazon's internal data catalog by over 90%. A simple ParquetDatasource benchmark reading 144MM records across 100 ~70MiB (on-disk) Parquet files stored in S3 showed an ~75% reduction in read latency (from 4.62 seconds to 1.18 seconds on 2 r5n.8xlarge EC2 nodes).	2022-03-11 11:52:56 -08:00
xwjiang2010	5d776b00e6	[tuner] fix result_grid (#23078 )	2022-03-11 11:34:44 -08:00
xwjiang2010	f270d84094	[AIR] switch to a common RunConfig. (#23076 )	2022-03-11 10:55:36 -08:00
SangBin Cho	2b38fe89e2	[Nightly tests] Migrate rest of core tests (#23085 ) MIgrate the rest of core tests	2022-03-11 10:41:14 -08:00
Kai Fricke	04ea180dfb	[ci/release] Add "tiny" concurrency group, change limits (#23065 ) E.g. long running tests run on small clusters (often 8 CPUs) but block other jobs for a long time. We should thus add more granularity to the concurrency groups. Additionally, limits have been slightly adjusted to make more sense (e.g. 8 GPUs are now small-gpu, 9+ GPUs large-gpu, instead of 7 for small-gpu and 8 for large-gpu).	2022-03-11 10:19:38 -08:00
Stephanie Wang	28d597e009	Revert "[workflow] Convert DAG to workflow (#22925 )" (#23081 ) This reverts commit `0a9f966e63`.	2022-03-11 09:49:08 -08:00
shrekris-anyscale	665bdbff47	[serve] Exclude unset fields from Ray actor options (#23059 ) The `schema_to_deployment()` function preserve unset fields with unexpected default argument types. This change excludes unset fields in that function and also changes the dictionaries' default values to empty dicts.	2022-03-11 10:45:21 -06:00
Kai Fricke	a8bed94ed6	[ci/release] Always use full cluster address (#23067 ) Not using the full cluster address is deprecated and breaks Job usage for uploads/downloads: https://buildkite.com/ray-project/release-tests-branch/builds/135#2a03e47b-6a9a-42ff-9346-905725eb8d09	2022-03-11 16:31:21 +00:00
Kenneth	07372927cc	Enable buffering and spilling to multiple remote storages (#22798 ) Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally. In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes. Buffer Size \| Runtime (s) -- \| -- Default \| 3221.865916 256KB \| 1758.885839 1MB \| 748.226089 10MB \| 526.406466 100MB \| 494.830513 Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size. For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes. Prefixes \| Runtime (s) -- \| -- 1 \| 748.226089 3 \| 527.658646 10 \| 516.010742 Together these changes enable faster large-scale object spilling. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-11 11:27:02 -05:00
Kai Fricke	61295f8b58	[ml/checkpoint] Fix checkpoint location on remote node (#23068 ) Currently breaks tests where the checkpoint is stored on a remote node (e.g. via Ray client), e.g.: https://buildkite.com/ray-project/release-tests-branch/builds/132#6a4936a8-41dd-4fd2-9f02-976855cbd9b7 Instead, we can set the properties manually. In the future, we need a story on how to refer to checkpoints kept on remote nodes.	2022-03-11 15:38:21 +00:00
Jialing He	0cbbb8c1d0	[runtime env][core] Use Proto message `RuntimeEnvInfo` between user code and core_worker (#22856 )	2022-03-11 22:14:18 +08:00
SangBin Cho	965d609627	[Nightly test] Fix a minor syntax issue for core nightly tests (#23069 ) Add frequency to smoke tests Remove unnecessary alerts	2022-03-11 04:58:40 -08:00
Kai Fricke	5b2d58674b	[ci/release] Migrate horovod tests (#22951 ) Migrating horovod tests to new release package. https://buildkite.com/ray-project/release-tests-branch/builds/125	2022-03-11 09:53:29 +00:00
Kai Fricke	aed17dd346	Revert "Revert "[ml/tune] Expose new checkpoint interface to users (#22741 )" (#23006 )" (#23009 ) This reverts commit `85598d9d10`. Test breakage was unrelated.	2022-03-11 09:51:41 +00:00
Tao Wang	10c03cb126	Migrating to flat hash map [GCS&util&common] (#22932 ) Next move of #19220. This pr replace unordered_map to flat_hash_map in most GCS code and some util & common modules. The placement group part, which exposes user interfaces in Java/Python, is exclusive as it's a little bit complicated. The follow-up PRs would be migrating in core worker, placement group and others.	2022-03-11 18:35:06 +09:00
Yi Cheng	ec88eb7d1d	[4][resource reporting] Remove ray syncer from gcs_resource_manager (#22832 ) This PR is part of resource reporting refactoring. In this PR ray syncer is moved from gcs_resource_manager to gcs_placement_group_scheduler. With this one, gcs_resource_manager is totally decoupled from resource broadcasting.	2022-03-11 01:15:25 -08:00
Jialing He	0c5440ee72	[runtime env] Deletes the proto cache on RuntimeEnv (#22944 ) Mainly the following things: - This PR deletes the proto cache on RuntimeEnv, ensuring that the user's modification of RuntimeEnv can take effect in the Proto message. - validate whole runtime env when serialize runtime_env. - overload method `__setitem__` to parse and validate field when it has to modify.	2022-03-11 15:37:18 +08:00
matthewdeng	3a3a7b4be4	[test] add back deleted datasets train test file (#23051 )	2022-03-10 21:46:07 -08:00
Amog Kamsetty	f80602b7d2	[Datasets] Separate pandas to torch conversion in `to_torch` (#22939 ) Separate out the conversion of pandas dataframe to torch tensor in a utility function so that the same logic can be used in other places in Ray ML (for example during inference).	2022-03-10 20:40:01 -08:00
xwjiang2010	4b28bc3f09	[Tuner part1] Add Tuner interface. (#22975 )	2022-03-10 19:55:59 -08:00
Siyuan (Ryans) Zhuang	0a9f966e63	[workflow] Convert DAG to workflow (#22925 ) * convert DAG to a workflow * deduplicate * check duplication of steps * add test for object refs	2022-03-10 19:40:14 -08:00
Eric Liang	148eaeac2e	[minor] Leave a big of wiggle room when calculating shared memory max (#23034 )	2022-03-10 17:37:26 -08:00
Amog Kamsetty	9bd00f3e1a	[ml/train] Remove `ConvertibleToTrainable` and move `Trainer` to `ray.ml.trainer` (#23030 ) As discussed, - Removes ConvertibleToTrainable interface and makes as_trainable part of the Trainer interface - Moves Trainer interface to ray.ml.trainer from ray.ml.train.trainer	2022-03-10 15:24:58 -08:00
SangBin Cho	ebac18d163	[Nightly test] Support Job based file manager + runner (#22860 ) This PR supports the job-based file manager and runner. It will be the backbone of k8s migration. The PR handles edge cases that originally existed in the old e2e.py job-based runners.	2022-03-10 15:03:50 -08:00
Edward Oakes	5a18802ad7	[serve] Remove runtime-env arg from serve start (#23017 )	2022-03-10 15:15:59 -06:00
Archit Kulkarni	52a722ffe7	[jobs] Make local pip/conda requirements files work with jobs (#22849 )	2022-03-10 15:15:16 -06:00
Amog Kamsetty	a5f41b2c9f	[ml/train] Training Interfaces [1/4]: Ray AIR `Trainer` interface (#22980 )	2022-03-10 13:12:44 -08:00
Guyang Song	3d9f214833	[runtime env] Fix import in subprocess when using pip in runtime_env (#22983 ) Fix the issue https://github.com/ray-project/ray/issues/22968	2022-03-10 15:11:41 -06:00
Max Pumperla	2b8faae40c	[docs] re/move old core examples (#22802 )	2022-03-10 12:17:00 -08:00
xwjiang2010	b1496d235f	[tune] fix error handling for fail_fast case. (#22982 )	2022-03-10 20:10:05 +00:00
Simon Mo	832354ce3f	[Serve] Compatibility bridge between model wrappers and pipeline (#22995 )	2022-03-10 11:52:03 -08:00
Chen Shen	3ebc4ae289	fix comments and typo (#23008 ) Fix comments and typos for scheduler code.	2022-03-10 11:40:31 -08:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
Yi Cheng	9f275c9bb8	[3][resource reporting] Use GCS to report the placement group creation information instead of reporting by raylet (#22597 )	2022-03-10 11:08:21 -08:00
qicosmos	e4a9517739	[C++ Worker]Python call cpp worker (#22820 )	2022-03-10 11:06:14 -08:00
Yi Cheng	bb5fa6b851	Remove redis in setup.py (#22979 )	2022-03-10 11:05:03 -08:00
Archit Kulkarni	c78bd809ce	[job submission] Support local py_modules in jobs (#22843 )	2022-03-10 11:42:25 -06:00
Stephanie Wang	85598d9d10	Revert "[ml/tune] Expose new checkpoint interface to users (#22741 )" (#23006 ) This reverts commit `e9692a2a80`.	2022-03-10 17:07:44 +00:00
SangBin Cho	92b50ff5da	Migrate multi nightly tests (#23005 )	2022-03-11 01:32:10 +09:00
shrekris-anyscale	1100c98222	[serve] Implement Serve Application object (#22917 ) The concept of a Serve Application, a data structure containing all information needed to deploy Serve on a Ray cluster, has surfaced during recent design discussions. This change introduces a formal Application data structure and refactors existing code to use it.	2022-03-10 10:28:29 -06:00
Max Pumperla	d8e862eaba	[docs] templates and contribution guide (fixes #21753 ) (#23003 ) Adding an explicit contributor guide and example templates for our users to help with docs. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-10 15:28:07 +00:00

1 2 3 4 5 ...

11649 commits