Commit graph

669 commits

Author SHA1 Message Date
SangBin Cho
e62c0052a0
[Dashboard] Agent in minimal ray installation (#21817)
This is the second part of https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit#. After this PR, dashboard agents will fully work with minimal ray installation.

Note that this PR requires to introduce "aioredis", "frozenlist", and "aiosignal" to the minimal installation. These dependencies are very small (or will be removed soon), and including them to minimal makes thing very easy. Please see the below for the reasoning.
2022-01-26 04:03:54 -08:00
Alex Wu
7a45f60dbc
[autoscaler] Fix ray.autoscaler.sdk import issue (#21795)
This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path. 

Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way.

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-01-25 14:43:24 -08:00
Matti Picus
d3d1e8559c
enable passing metric tests on windows (#21755)
Resubmitting #21705 which was merged then reverted. It seems somehow sphinx building broke in the meantime, not clear how it is connected to this PR.

Here is the original description:
>Part of the effort to enable tests on windows, this enables test_metrics and test_metric_agents, which pass locally.
2022-01-25 09:20:16 -08:00
Lingxuan Zuo
ec62d7f510
[Streaming]Farewell : remove all of streaming related from ray repo. (#21770)
New repo url is https://github.com/ray-project/mobius

Co-authored-by: 林濯 <lingxuzn.zlx@antgroup.com>
2022-01-23 17:53:41 +08:00
SangBin Cho
b6d3e01e0b
Revert "WINDOWS: enable passing metric tests (#21705)" (#21738)
This reverts commit 8104fd5c76.
2022-01-20 07:27:49 -08:00
Matti Picus
8104fd5c76
WINDOWS: enable passing metric tests (#21705) 2022-01-19 17:09:34 -08:00
Kai Fricke
8fd5b7a5a8
Tune test autoscaler / fix stale node detection bug (#21516)
See #21458. Currently, Tune keeps its own list of alive node IPs, but this information is only updated every 10 seconds and is usually stale when a new node is added. Because of this, the first trial scheduled on this node is usually marked as failed. This PR adds a test confirming this behavior and gets rid of the unneeded code path.

Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
2022-01-18 16:20:16 -08:00
Gagandeep Singh
970b7b2a4b
Unskip tests from ci.sh (#21483) 2022-01-17 15:22:57 -08:00
Archit Kulkarni
26057c433f
[CI] pin uvicorn to 0.16.0 to fix serve (#21612) 2022-01-14 16:00:51 -08:00
Matti Picus
f4da0410b3
WINDOWS: unskip actor, component_failure, failure tests (#21492)
Unskip windows tests that pass locally
2022-01-13 23:16:22 -08:00
mwtian
30968a9358
[GCS] support external Redis in GCS bootstrapping mode (#21436)
External Redis should still be supported with GCS bootstrapping, to avoid breaking users.
In GCS mode, some logic are removed for external Redis:
- Printing external Redis addresses to terminal: hard to implement across `ray start`, `ray.init()` and Ray cluster util.
- Starting local Redis if external Redis is unavailable: failing loudly here seems more appropriate.

Also, re-enable a few tests which restarts GCS in GCS bootstrapping mode, by using external Redis for KV storage.
2022-01-13 16:01:11 -08:00
mwtian
cf6a54ca46
[CI] pin pytest-asyncio (#21579) 2022-01-13 11:35:30 -08:00
Kai Fricke
a3442df584
[ci/multinode] Build multinode image with OpenSSH before running tests (#21544)
Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.
2022-01-13 08:47:04 -08:00
Kai Fricke
5a7f6e4fdd
[rfc][ci] create fake docker-compose cluster environment (#20256)
Following #18987 this PR adds a docker-compose based local multi node cluster.

The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements.

This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.
2022-01-11 04:35:36 +00:00
Matti Picus
f3dcd1fac1
WINDOWS: re-enable runtime_env tests, skip cluster tests in serve (#21398)
After enabling tests of test_runtime_env_plugin and test_runtime_env_env_vars (PR #21252) and python/ray/serve:* tests (PR #21107), the analysis at flaky-tests.ray.io starting showing failing tests in the windows://python/ray/test/serv:test_standalone. PR #21352 reverted 21252 (runtime_env tests), but the problem was more likely in the serve tests. Specifically  `test_standalone` has a test that uses Cluster, which should be skipped on windows because it is flaky. So this PR
- re-enables the runtime_env tests for windows
- skips the Cluster test in serve/tests/test_standalone.py
2022-01-06 21:43:58 -08:00
Archit Kulkarni
fd02065ce5
[CI] [docker] Fix docker image name regex matching (#21409) 2022-01-05 18:59:10 -08:00
Ian Rodney
1b42a49e71
[CI] [Docker Build] Allow Branches with Double digits in regex matching(#21401) 2022-01-05 14:19:19 -08:00
mwtian
24da654d90
[Test] Shard "Small & Large" tests (#21351) 2022-01-05 10:49:14 -08:00
Kai Fricke
94242e3e6e
[ci/repro] Add SYS_PTRACE to docker container, use unique name (#21377)
This will start repro docker containers with SYS_PTRACE capabilities to enable debugging e.g. via py-spy.
Additionally, default instance name tags for instance re-use will be generated using the buildkite build id and job id.
2022-01-04 16:59:12 +00:00
Archit Kulkarni
4581baa7dc
Revert "WINDOWS: unskip passing runtime_env tests (#21252)" (#21352)
This reverts commit fcb952e1bc.
2022-01-03 11:07:17 -08:00
Balaji Veeramani
43a9e95dc0
[CI] Add support for Black formatting (#21281) 2022-01-03 10:06:41 -08:00
Kai Fricke
10290eeb2f
[ci] Pin manylinux docker image (#21341) 2022-01-03 14:36:21 +00:00
Kai Fricke
14ed7cfaaa
[ci] Add repro-ci.py script to automatically setup Buildkite-runner-like instances to debug CI runs (#21292)
Create an AWS instance to reproduce Buildkite CI builds.

This script will take a Buildkite build URL as an argument and create
an AWS instance with the same properties running the same Docker container
as the original Buildkite runner. The user is then attached to this instance
and can reproduce any builds commands as if they were executed within the
runner.

This utility can be used to reproduce and debug build failures that come up
on the Buildkite runner instances but not on a local machine.
2021-12-31 10:31:50 +00:00
Matti Picus
3de18d2ada
WINDOWS: enable passing/skipping tests (#21136) 2021-12-27 11:59:00 -08:00
Matti Picus
fcb952e1bc
WINDOWS: unskip passing runtime_env tests (#21252) 2021-12-26 20:49:02 -08:00
Akash Patel
cbcd03b779
Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
Gagandeep Singh
c5c5fec22b
Unskip test_standalone from ci.sh (#21235) 2021-12-25 00:21:58 -08:00
Simon Mo
cfe0897d05
[CI] Migrate Windows tests to Buildkite (#21227) 2021-12-21 20:16:34 -08:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Simon Mo
956774e757
[CI] Disable serve test_standalone on windows again (#21154) 2021-12-17 10:32:27 -08:00
Matti Picus
29965ad325
enable passing serve tests on windows (#21107)
* enable passing serve tests on windows

* move test_handle to 'medium' and enable'

* move test_cli to 'medium'
2021-12-16 14:03:11 -08:00
Matti Picus
d2cd0730a0
[Windows] Enable test_advanced_2 on windows (#20994) 2021-12-15 14:30:40 -08:00
Ian Rodney
c7fb5a94d1
[CI] Upgrade Pip to 21.3 (#21111) 2021-12-15 13:29:45 -08:00
Edward Oakes
10947c83b3
[runtime_env] Make pip installs incremental (#20341)
Uses a direct `pip install` instead of creating a conda env to make pip installs incremental to the cluster environment.

Separates the handling of `pip` and `conda` dependencies.

The new `pip` approach still works if only the base Ray is installed on the cluster and the user specifies libraries like "ray[serve]" in the `pip` field.  The mechanism is as follows:
- We don't actually want to reinstall ray via pip, since this could lead to version mismatch issues.  Instead, we want to use the Ray that's already installed in the cluster.
- So if "ray" was included by the user in the pip list, remove it
- If a library "ray[serve]" or "ray[tune, rllib]" was included in the pip list, remove it and replace it by its dependencies (e.g. "uvicorn", "requests", ..)

Co-authored-by: architkulkarni <arkulkar@gmail.com>
Co-authored-by: architkulkarni <architkulkarni@users.noreply.github.com>
2021-12-14 15:55:18 -08:00
Matti Picus
aec04989fc
WINDOWS: enable test_advanced_3.py (#21056) 2021-12-14 09:25:23 -08:00
Eric Liang
6f93ea437e
Remove the flaky test tag (#21006) 2021-12-11 01:03:17 -08:00
Kai Fricke
97ec2a03b6
[ci/buildkite] Add ml pipeline to speed up ML/RLLib tests (#20895)
ML tests will be built in a separate bootstrap step installing all required dependencies.
2021-12-09 21:14:10 +00:00
Amog Kamsetty
611bfc1352
[ML] Move find_free_port to ml_utils (#20828)
Small refactoring of common utility used by Train, Tune, and Rllib.
2021-12-03 13:38:42 -08:00
matthewdeng
0de105d42f
[train] update Trainer._is_tune_enabled to work when Tune is not installed (#20767) 2021-11-29 20:08:51 -08:00
Guyang Song
191be85057
[script][format] check copyright for .proto files (#20632)
## Why are these changes needed?
- I found that we also have a copyright header in .proto files. Add it to the copyright formatter.
2021-11-23 12:26:30 +08:00
Simon Mo
add2450b92
[CI] [Hotfix] Skip test_standalone (#20556) 2021-11-18 16:47:18 -08:00
shrekris-anyscale
a91ddbdeb9
Add smart_open dependency to ray[default] (#20420) 2021-11-18 10:00:30 -06:00
Amog Kamsetty
4cbcb11458
[Docker] Add commit as label (#20504)
Adds the Ray commit sha as a label for the docker image.
2021-11-17 15:20:41 -08:00
Richard Liaw
1cadd61917
Fix horovod failing tests by pinning down (#20484) 2021-11-17 13:54:25 -08:00
Simon Mo
18d605fa7c
[Serve] Add experimental CLI for serve deploy (#20371) 2021-11-16 20:22:09 -08:00
Simon Mo
2dc7a6c9f8
[CI] Pin manylinux image (#20451) 2021-11-16 17:52:51 -08:00
Philipp Moritz
440da92263
Fix manylinux2014 build scripts (#20347) 2021-11-14 19:42:23 -08:00
Edward Oakes
73e570c426
Fix windows build (don't skip test_job_manager.py) (#20294) 2021-11-12 11:13:15 -08:00
Matti Picus
1e80a2a83a
[WINDOWS] unskip tests (#20212) 2021-11-12 10:11:11 -08:00
chenk008
74fa267c72
Enable worker in container CI test (#20174) 2021-11-11 16:11:06 -08:00