hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Tao Wang	bc14512471	[Hotfix]Fix test_actor failure caused by interface change (#23000 )	2022-03-10 19:34:12 +08:00
Kai Fricke	007cf03d7a	[ci/release] Migrate RLLib tests (#22967 ) Migrate to new release package. https://buildkite.com/ray-project/release-tests-branch/builds/111	2022-03-10 10:26:03 +00:00
Kai Fricke	fee4065daf	[ci/release] Migrate SGD tests (#22966 ) Migrate to new release package. https://buildkite.com/ray-project/release-tests-branch/builds/110	2022-03-10 10:23:50 +00:00
Kai Fricke	614dc6b511	[ci/release] Migrate Serve tests (#22965 ) Migrate to new release package. https://buildkite.com/ray-project/release-tests-branch/builds/109	2022-03-10 10:23:25 +00:00
Kai Fricke	ccda1555cc	[ci/release] Migrate Runtime Env tests (#22963 ) Migrating to new release test package. https://buildkite.com/ray-project/release-tests-branch/builds/108	2022-03-10 10:22:57 +00:00
Kai Fricke	e9692a2a80	[ml/tune] Expose new checkpoint interface to users (#22741 ) This PR exposes the new checkpoint interface, implemented in #22691, to end users. It does this by replacing the old external facing TrialCheckpoint class with a merged class that supports the old TrialCheckpoint API (upload, download, save) as well as the new Checkpoint API. With this PR, users can use the new Checkpoint interface for downstream processing of their Ray Tune results. In a follow-up PR, the new Checkpoint interface will be used internally within Ray Tune and Train for bookkeeping, however, that is not required to unblock the Ray ML use case.	2022-03-10 10:20:24 +00:00
kyle-chen-uber	592656ca28	[horovod] remove deprecated slot concept, use worker instead (#22708 ) Horovod updated the attributes of DistributedTrainableCreator and args to create Horovod RayExecutor. horovod/horovod@a729ba7 The major issue is Horovod deprecated "slot" concept, use "worker" instead, which is more consistent with Generic Ray worker. The issue is currently blocking Uber DL trainers to use raytune. This commit updates the Horovod RayExecutor init args. Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-03-10 08:16:42 +00:00
Kai Fricke	18d535f290	[ci/release] Migrate LightGBM tests (#22952 ) Note that LightGBM release tests were previously not enabled. https://buildkite.com/ray-project/release-tests-branch/builds/113 https://buildkite.com/ray-project/release-tests-branch/builds/114 Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-03-10 08:14:31 +00:00
Edward Oakes	22e698d0ff	[serve][release tests] Add smoke test to CI for remaining tests (#22962 )	2022-03-09 23:36:32 -06:00
shrekris-anyscale	bc82e2d5c4	[serve] Restore "[serve] Support working_dir in serve run (#22760 )" (#22971 )	2022-03-09 21:31:23 -08:00
Dmitri Gekhtman	19b4281991	[KubeRay] Pin autoscaler image (#22987 ) Sets the autoscaler image to the one from this PR's commit. #22847	2022-03-09 20:38:37 -08:00
Dmitri Gekhtman	413fe08f87	Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847 ) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR.	2022-03-09 18:26:57 -08:00
Jiao	3546aabefd	[7/X][Pipeline] pipeline user facing build function (#22934 )	2022-03-09 16:11:11 -08:00
Simon Mo	34ffc7e5cf	[Serve] [3/3 Wrappers] Add Model Wrapper with `ray.ml` (#22915 )	2022-03-09 16:06:59 -08:00
Stephanie Wang	1b45582e43	[tests] Enable chaos testing for Dask-on-Ray (#22927 ) Turns on failures for Dask-on-Ray chaos tests.	2022-03-09 18:08:41 -05:00
Simon Mo	c844c706bf	[Serve] Use starlette public accessor for Request (#22957 )	2022-03-09 13:25:03 -08:00
Edward Oakes	135cd121b9	[release tests] Fix minor bug in multi-deployment serve test (#22961 )	2022-03-09 14:37:27 -06:00
mwtian	3ccc2aa17a	Revert "[Core] Update grpc to 1.44.0 (#22384 )" (#22958 ) This reverts commit `5ebc32d7c2`.	2022-03-09 11:40:35 -08:00
Jiao	ea9069fef4	[6/X][Pipeline] Add HTTP ingress to serve pipeline (#22878 )	2022-03-09 11:39:15 -08:00
Simon Mo	3c4827e0b2	[Serve] [2/3 Wrappers] Add Basic HTTP Adapters (#22914 )	2022-03-09 11:36:46 -08:00
Antoni Baum	2ead945438	[datasets] Make `label_column` optional in `to_tf` (#22916 ) Makes the `label_column` argument in `Dataset.to_tf` optional so that it can be used for prediction.	2022-03-09 11:34:18 -08:00
shrekris-anyscale	61e132b478	[serve] Split `test_deploy` (#22908 ) `test_deploy` has become [flakey](https://flakey-tests.ray.io/#) due to timeout. Since `test_deploy` is already a "large" test, this change splits it into two testing files instead of simply increasing the timeout.	2022-03-09 12:22:51 -06:00
Kai Fricke	b267be4758	[ml] Add Ray ML / AIR checkpoint implementation (#22691 ) This PR splits up the changes in #22393 and introduces an implementation of the ML Checkpoint interface used by Ray Tune. This means, the TuneCheckpoint class implements the to/from_[bytes\|dict\|directory\|object_ref\|uri] conversion functions, as well as more high-level functions to transition between the different TuneCheckpoint classes. It also includes test cases for Tune's main conversion modes, i.e. dict - intermediate - dict and fs - intermediate - fs. These changes will be the basis for refactoring the tune interface to use TuneCheckpoint objects instead of TrialCheckpoints (externally) and instead of paths/objects (internally).	2022-03-09 10:02:59 -08:00
Eric Liang	79a3b56015	[ml] Improve the documentation of ml common classes; add kwargs to predictor (#22936 )	2022-03-09 10:01:20 -08:00
Kai Fricke	ca87c37c61	[ci/release] Fix result output in Buildkite pipeline run (#22946 ) The new buildkite pipeline prints out faulty results due to a confusion of -ge/-gt and -le/-lt in the retry script. This is a cosmetic error (so behavior was still correct) that is resolved with this PR.	2022-03-09 17:29:31 +00:00
Simon Mo	77ead01b65	[Serve] [1/3 Wrappers] Allow `@serve.batch` to accept args and kwargs (#22913 )	2022-03-09 09:15:57 -08:00
Kai Fricke	15601ed79b	Revert "[serve] Support `working_dir` in `serve run` (#22760 )" (#22956 ) This reverts commit `ab2741d64b`. The PR breaks ray job submission for anyscale:// URLs	2022-03-09 17:04:46 +00:00
Jiajun Yao	069f5f467c	[Test] Fix and enable test_logging.py (#22904 ) Fix and enable test_logging.py	2022-03-09 09:01:38 -08:00
ZhuSenlin	a15890be58	[GCS] refactor the resource related data structures on the GCS (#22924 ) * refactor resource data structure in gcs * fix comment * fix lint error * fix * DISABLED_TestRejectedRequestWorkerLeaseReply as it depends on the update of normal task Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>	2022-03-09 08:22:02 -08:00
simonsays1980	8627f44d7f	[RLlib] Remove duplicate code block: Config deprecation check for `metrics_smoothing_episodes` (#22152 )	2022-03-09 16:51:42 +01:00
Edward Oakes	2cac49e4b0	[serve][release tests] Mark long-running failure test as non-stable (#22922 )	2022-03-09 09:42:47 -06:00
Kai Fricke	ac654dbb9d	[ci/release] Fix schema validation for single tests / add `stable` field (#22947 ) This currently leads to failing builds for schema validation errors after #22901 was merged (the stable column was incorrectly not added to the schema before).	2022-03-09 15:22:49 +00:00
matthewdeng	6b0169b23d	[ml] enable CI tests (#22926 ) Follow-up to #22748, enabling tests in CI. Conditions: A new RAY_CI_ML_AFFECTED condition is added for this test suite. The package currently depends on Ray Data, and will be triggered accordingly. Dependencies: Adding DATA_PROCESSING_TESTING dependencies (set for install-dependencies.sh) for now.	2022-03-09 14:31:53 +00:00
Jialing He	795b5787dc	[runtime env][bug] Fix RuntimEnv ignore eager_install when _validate is True (#22935 ) When _validate is True, RuntimeEnv will ignore field eager_install.	2022-03-09 20:16:55 +08:00
Kai Fricke	cac9d30909	[ci/release] Add schema validation for release test config (#22919 ) To avoid breakage like in #22905, this PR adds schema validation to the release test package. In a follow-up PR, we'll likely switch this to use pydantic instead.	2022-03-09 09:50:51 +00:00
Siyuan (Ryans) Zhuang	b621dc099b	[DAG] Update the example in the doc (#22930 ) * update doc	2022-03-08 20:09:45 -08:00
Guyang Song	56287d63e5	[runtime env] remove _rewrite_pip_list_ray_libraries (#22890 ) We don't need this logic after using virtualenv.	2022-03-09 11:41:33 +08:00
Edward Oakes	aa907987bf	[serve][release tests] Use m5.8xlarge instance types for 1k replica tests (#22918 )	2022-03-08 21:34:01 -06:00
Stephanie Wang	bf09f5071a	[core] Deflake test_plasma_unlimited (#22911 ) test_plasma_unlimited::test_task_unlimited is flaky because one of the assertions is race-y and can trigger after the condition is no longer true (see #22883). This fixes the flake by: - adding an assertion in between two object allocations to force the object store queue to flush - keeping one of the ObjectRefs in scope to make sure that the object is still fallback-allocated by the time we reach the failing assertion	2022-03-08 22:00:04 -05:00
Chen Shen	bc3f7a7684	[scheduling policy 3/n][rfc] Refactor SchedulingPolicy into interface and implementations (#22907 ) * scheduling policy * update Co-authored-by: Gagandeep Singh <gdp.1807@gmail.com>	2022-03-08 18:47:56 -08:00
Junwen Yao	0395d0987e	[Train] Add support for automatic pipelining of host to device transfer (#22716 ) This PR adds the support for concurrently transferring the input from host to device.	2022-03-08 18:37:23 -08:00
Balaji Veeramani	48af260aaf	[Train] Clarify shuffle documentation in `prepare_data_loader` (#22876 ) We essentially use a hack to determine whether shuffling should be enabled in prepare_data_loader. I've clarified the documentation so the hack is easier to understand.	2022-03-08 18:13:29 -08:00
Alex Wu	b84aaef38a	Promote python 3.9 support to stable (#22923 ) Remove the experimental note from python 3.9 since it and its core dependencies have been stable for quite some time now. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-03-08 17:24:54 -08:00
SangBin Cho	549527687f	Migrate scalability tests (#22901 ) This PR migrates scalability tests to the new infra. I had to copy the benchmarks folder to the release folder to make it work. I will remove some unnecessary files (e.g., benchmark.yaml or wait_for_cluster file) Alternatively we can support a different path than /release from the tool, but I think this way is cleaner. I am open to suggestion though cc @krfricke	2022-03-08 17:22:41 -08:00
Eric Liang	52491c87e2	Make a pass fixing Dataset API issues (#22886 )	2022-03-08 13:07:55 -08:00
shrekris-anyscale	ab2741d64b	[serve] Support `working_dir` in `serve run` (#22760 ) #22714 added `serve run` to the Serve CLI. This change allows the user to specify a local or remote `working_dir` in `serve run`.	2022-03-08 13:18:41 -06:00
Junwen Yao	d1009c8489	[Train] Add support for metrics aggregation (#22099 ) This PR allows users to aggregate metrics returned from all workers.	2022-03-08 11:03:04 -08:00
Simon Mo	c8aa6cdf64	Fix Issue Severity Question to Bug Report Template (#22906 )	2022-03-08 10:36:32 -08:00
Wendi-anyscale	dd8654fd85	Add Issue Severity Question to Bug Report Template (#22887 )	2022-03-08 10:31:53 -08:00
Balaji Veeramani	37c6169027	[Train] Refactor and add `Accelerator` classes (#22009 ) To support mixed precision (see #20643), we need to store a GradScaler instance that is accessibly by both prepare_optimizer and backward functions (these functions will be added later). This PR introduces the Accelerator, an object that implements methods to perform backend-specific training optimizations.	2022-03-08 10:26:00 -08:00

1 2 3 4 5 ...

11694 commits