Commit graph

11765 commits

Author SHA1 Message Date
Archit Kulkarni
52a722ffe7
[jobs] Make local pip/conda requirements files work with jobs (#22849) 2022-03-10 15:15:16 -06:00
Amog Kamsetty
a5f41b2c9f
[ml/train] Training Interfaces [1/4]: Ray AIR Trainer interface (#22980) 2022-03-10 13:12:44 -08:00
Guyang Song
3d9f214833
[runtime env] Fix import in subprocess when using pip in runtime_env (#22983)
Fix the issue https://github.com/ray-project/ray/issues/22968
2022-03-10 15:11:41 -06:00
Max Pumperla
2b8faae40c
[docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
xwjiang2010
b1496d235f
[tune] fix error handling for fail_fast case. (#22982) 2022-03-10 20:10:05 +00:00
Simon Mo
832354ce3f
[Serve] Compatibility bridge between model wrappers and pipeline (#22995) 2022-03-10 11:52:03 -08:00
Chen Shen
3ebc4ae289
fix comments and typo (#23008)
Fix comments and typos for scheduler code.
2022-03-10 11:40:31 -08:00
Max Pumperla
11c40e363d
[docs] external promo content (#22823) 2022-03-10 11:39:44 -08:00
Yi Cheng
9f275c9bb8
[3][resource reporting] Use GCS to report the placement group creation information instead of reporting by raylet (#22597) 2022-03-10 11:08:21 -08:00
qicosmos
e4a9517739
[C++ Worker]Python call cpp worker (#22820) 2022-03-10 11:06:14 -08:00
Yi Cheng
bb5fa6b851
Remove redis in setup.py (#22979) 2022-03-10 11:05:03 -08:00
Archit Kulkarni
c78bd809ce
[job submission] Support local py_modules in jobs (#22843) 2022-03-10 11:42:25 -06:00
Stephanie Wang
85598d9d10
Revert "[ml/tune] Expose new checkpoint interface to users (#22741)" (#23006)
This reverts commit e9692a2a80.
2022-03-10 17:07:44 +00:00
SangBin Cho
92b50ff5da
Migrate multi nightly tests (#23005) 2022-03-11 01:32:10 +09:00
shrekris-anyscale
1100c98222
[serve] Implement Serve Application object (#22917)
The concept of a Serve Application, a data structure containing all information needed to deploy Serve on a Ray cluster, has surfaced during recent design discussions. This change introduces a formal Application data structure and refactors existing code to use it.
2022-03-10 10:28:29 -06:00
Max Pumperla
d8e862eaba
[docs] templates and contribution guide (fixes #21753) (#23003)
Adding an explicit contributor guide and example templates for our users to help with docs.

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-03-10 15:28:07 +00:00
Jiajun Yao
2e828cc9e1
Delete dead test_setup_worker.py (#22970)
The tested code is dead so we can remove the code and the test.
2022-03-10 07:20:41 -08:00
SangBin Cho
d192ec30fd
[Nightly Tests] Readjust the concurrency limit. (#23002)
This PR reduces the concurrency limit. Based on the back of envelope calculation, the current concurrency limit can easily exceed the service quota.

Given large == 2048 vCPUs, it will use about 20K vCPUs, which is slightly larger than the limit.
2022-03-10 07:19:38 -08:00
SangBin Cho
4fa294ca49
[Nightly tests] Stop running broken tests (#22993) 2022-03-10 06:59:51 -08:00
SangBin Cho
e88abe4c8e
[Nightly tests] migrated most of daily tests (#22960)
* migrated most of daily tests

* Addressed code review.
2022-03-10 05:49:16 -08:00
Antoni Baum
bf49d37176
[tune] Add Trainable.postprocess_checkpoint (#22973)
Adds postprocess_checkpoint method to Trainable to facilitate the checkpointing of preprocessors in AIR.
2022-03-10 12:14:39 +00:00
Tao Wang
bc14512471
[Hotfix]Fix test_actor failure caused by interface change (#23000) 2022-03-10 19:34:12 +08:00
Kai Fricke
007cf03d7a
[ci/release] Migrate RLLib tests (#22967)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/111
2022-03-10 10:26:03 +00:00
Kai Fricke
fee4065daf
[ci/release] Migrate SGD tests (#22966)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/110
2022-03-10 10:23:50 +00:00
Kai Fricke
614dc6b511
[ci/release] Migrate Serve tests (#22965)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/109
2022-03-10 10:23:25 +00:00
Kai Fricke
ccda1555cc
[ci/release] Migrate Runtime Env tests (#22963)
Migrating to new release test package.

https://buildkite.com/ray-project/release-tests-branch/builds/108
2022-03-10 10:22:57 +00:00
Kai Fricke
e9692a2a80
[ml/tune] Expose new checkpoint interface to users (#22741)
This PR exposes the new checkpoint interface, implemented in #22691, to end users. It does this by replacing the old external facing TrialCheckpoint class with a merged class that supports the old TrialCheckpoint API (upload, download, save) as well as the new Checkpoint API.

With this PR, users can use the new Checkpoint interface for downstream processing of their Ray Tune results. In a follow-up PR, the new Checkpoint interface will be used internally within Ray Tune and Train for bookkeeping, however, that is not required to unblock the Ray ML use case.
2022-03-10 10:20:24 +00:00
kyle-chen-uber
592656ca28
[horovod] remove deprecated slot concept, use worker instead (#22708)
Horovod updated the attributes of DistributedTrainableCreator and args to create Horovod RayExecutor.
horovod/horovod@a729ba7

The major issue is Horovod deprecated "slot" concept, use "worker" instead, which is more consistent with Generic Ray worker. The issue is currently blocking Uber DL trainers to use raytune.

This commit updates the Horovod RayExecutor init args.

Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-03-10 08:16:42 +00:00
Kai Fricke
18d535f290
[ci/release] Migrate LightGBM tests (#22952)
Note that LightGBM release tests were previously not enabled.
https://buildkite.com/ray-project/release-tests-branch/builds/113
https://buildkite.com/ray-project/release-tests-branch/builds/114

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-03-10 08:14:31 +00:00
Edward Oakes
22e698d0ff
[serve][release tests] Add smoke test to CI for remaining tests (#22962) 2022-03-09 23:36:32 -06:00
shrekris-anyscale
bc82e2d5c4
[serve] Restore "[serve] Support working_dir in serve run (#22760)" (#22971) 2022-03-09 21:31:23 -08:00
Dmitri Gekhtman
19b4281991
[KubeRay] Pin autoscaler image (#22987)
Sets the autoscaler image to the one from this PR's commit.
#22847
2022-03-09 20:38:37 -08:00
Dmitri Gekhtman
413fe08f87
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847)
This PR consists of the following clean-up items for KubeRay autoscaler integration:

Remove the docker/kuberay directory

Move the Python files formerly in docker/kuberay to the autoscaler directory.

Use a rayproject/ray image for the autoscaler.

Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config.

Slightly simplify the code that starts the autoscaler.

Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days.

By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config.

Add the autoscaler configuration test to the CI.

Update development documentation to reflect the changes in this PR.
2022-03-09 18:26:57 -08:00
Jiao
3546aabefd
[7/X][Pipeline] pipeline user facing build function (#22934) 2022-03-09 16:11:11 -08:00
Simon Mo
34ffc7e5cf
[Serve] [3/3 Wrappers] Add Model Wrapper with ray.ml (#22915) 2022-03-09 16:06:59 -08:00
Stephanie Wang
1b45582e43
[tests] Enable chaos testing for Dask-on-Ray (#22927)
Turns on failures for Dask-on-Ray chaos tests.
2022-03-09 18:08:41 -05:00
Simon Mo
c844c706bf
[Serve] Use starlette public accessor for Request (#22957) 2022-03-09 13:25:03 -08:00
Edward Oakes
135cd121b9
[release tests] Fix minor bug in multi-deployment serve test (#22961) 2022-03-09 14:37:27 -06:00
mwtian
3ccc2aa17a
Revert "[Core] Update grpc to 1.44.0 (#22384)" (#22958)
This reverts commit 5ebc32d7c2.
2022-03-09 11:40:35 -08:00
Jiao
ea9069fef4
[6/X][Pipeline] Add HTTP ingress to serve pipeline (#22878) 2022-03-09 11:39:15 -08:00
Simon Mo
3c4827e0b2
[Serve] [2/3 Wrappers] Add Basic HTTP Adapters (#22914) 2022-03-09 11:36:46 -08:00
Antoni Baum
2ead945438
[datasets] Make label_column optional in to_tf (#22916)
Makes the `label_column` argument in `Dataset.to_tf` optional so that it can be used for prediction.
2022-03-09 11:34:18 -08:00
shrekris-anyscale
61e132b478
[serve] Split test_deploy (#22908)
`test_deploy` has become [flakey](https://flakey-tests.ray.io/#) due to timeout. Since `test_deploy` is already a "large" test, this change splits it into two testing files instead of simply increasing the timeout.
2022-03-09 12:22:51 -06:00
Kai Fricke
b267be4758
[ml] Add Ray ML / AIR checkpoint implementation (#22691)
This PR splits up the changes in #22393 and introduces an implementation of the ML Checkpoint interface used by Ray Tune.

This means, the TuneCheckpoint class implements the to/from_[bytes|dict|directory|object_ref|uri] conversion functions, as well as more high-level functions to transition between the different TuneCheckpoint classes. It also includes test cases for Tune's main conversion modes, i.e. dict - intermediate - dict and fs - intermediate - fs.

These changes will be the basis for refactoring the tune interface to use TuneCheckpoint objects instead of TrialCheckpoints (externally) and instead of paths/objects (internally).
2022-03-09 10:02:59 -08:00
Eric Liang
79a3b56015
[ml] Improve the documentation of ml common classes; add kwargs to predictor (#22936) 2022-03-09 10:01:20 -08:00
Kai Fricke
ca87c37c61
[ci/release] Fix result output in Buildkite pipeline run (#22946)
The new buildkite pipeline prints out faulty results due to a confusion of -ge/-gt and -le/-lt in the retry script. This is a cosmetic error (so behavior was still correct) that is resolved with this PR.
2022-03-09 17:29:31 +00:00
Simon Mo
77ead01b65
[Serve] [1/3 Wrappers] Allow @serve.batch to accept args and kwargs (#22913) 2022-03-09 09:15:57 -08:00
Kai Fricke
15601ed79b
Revert "[serve] Support working_dir in serve run (#22760)" (#22956)
This reverts commit ab2741d64b.

The PR breaks ray job submission for anyscale:// URLs
2022-03-09 17:04:46 +00:00
Jiajun Yao
069f5f467c
[Test] Fix and enable test_logging.py (#22904)
Fix and enable test_logging.py
2022-03-09 09:01:38 -08:00
ZhuSenlin
a15890be58
[GCS] refactor the resource related data structures on the GCS (#22924)
* refactor resource data structure in gcs

* fix comment

* fix lint error

* fix

* DISABLED_TestRejectedRequestWorkerLeaseReply as it depends on the update of normal task

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-09 08:22:02 -08:00