Commit graph

786 commits

Author SHA1 Message Date
Akash Patel
96d579a4fe
Add support for Python 3.10 (#21221)
Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>
2022-08-26 11:01:12 -07:00
Yi Cheng
87ce8480ff
[core] Add stats for the gcs backend for telemetry. (#27876)
## Why are these changes needed?

To get better understanding of how GCS FT is used, adding this metrics.

Test:
```
cat /tmp/ray/session_latest/usage_stats.json
{"usage_stats": {"ray_version": "3.0.0.dev0", "python_version": "3.9.12", "schema_version": "0.1", "source": "OSS", "session_id": "70d3ecd3-5b16-40c3-9301-fd05404ea92a", "git_commit": "{{RAY_COMMIT_SHA}}", "os": "linux", "collect_timestamp_ms": 1660587366806, "session_start_timestamp_ms": 1660587351586, "cloud_provider": null, "min_workers": null, "max_workers": null, "head_node_instance_type": null, "worker_node_instance_types": null, "total_num_cpus": 16, "total_num_gpus": null, "total_memory_gb": 16.10752945020795, "total_object_store_memory_gb": 8.053764724172652, "library_usages": ["serve"], "total_success": 0, "total_failed": 13, "seq_number": 13, "extra_usage_tags": {"serve_api_version": "v1", "gcs_storage": "redis", "serve_num_deployments": "1"}, "total_num_nodes": 2, "total_num_running_jobs": 2}}
```
2022-08-16 17:02:04 -07:00
zcin
8cb09a9fc5
Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications"" (#27662) 2022-08-12 15:12:20 -07:00
Yi Cheng
dac7bf17d9
[serve] Make serve agent not blocking when GCS is down. (#27526)
This PR fixed several issue which block serve agent when GCS is down. We need to make sure serve agent is always alive and can make sure the external requests can be sent to the agent and check the status.

- internal kv used in dashboard/agent blocks the agent. We use the async one instead
- serve controller use ray.nodes which is a blocking call and blocking forever. change to use gcs client with timeout
- agent use serve controller client which is a blocking call with max retries = -1. This blocks until controller is back.

To enable Serve HA, we also need to setup:

- RAY_gcs_server_request_timeout_seconds=5
- RAY_SERVE_KV_TIMEOUT_S=5

which we should set in KubeRay.
2022-08-08 16:29:42 -07:00
zcin
64c550a2b1
Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications (#26403)" (#27587)
This reverts commit 8a9d994dd0.
2022-08-06 21:38:55 -07:00
se4ml
0a489c0c7c
[CI] Update ci/pipeline/py_dep_analysis_test.py to properly use with statement (#27600) 2022-08-06 12:17:35 -07:00
zcin
8a9d994dd0
[serve] Integrate and Document Bring-Your-Own Gradio Applications (#26403)
Integration between Ray Serve and Gradio. Users of Gradio can wrap their Gradio app in a Serve deployment by using `GradioIngress`, and scale it up through more replicas or more CPU/GPU resources.
2022-08-05 11:31:00 -05:00
Yi Cheng
95da64b53e
[ci] Fix the lint #27291
Signed-off-by: Yi Cheng <chengyidna@gmail.com>
2022-07-29 16:18:13 -07:00
Yi Cheng
ad262c1968
[ci] Fix test_gcs_ha_e2e.py (#27263)
This PR fix the broken test. The test failed because it's not installing the latest wheel.


Signed-off-by: Yi Cheng <chengyidna@gmail.com>
2022-07-29 13:53:40 -07:00
Chen Shen
559216780c
[CI][hotfix] remove no-index
--no-index will not try to install pip packages from pypi. this breaks CI because it failed to find grpcio==1.43.0 as it's missing from cache.
2022-07-28 23:19:21 -07:00
Eric Liang
a4434fac7f
[docs] Fix the remaining style violations in docstrings and add lint rule (#27033) 2022-07-27 22:24:20 -07:00
xwjiang2010
eb69c1ca28
[air] Add annotation for Tune module. (#27060)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 13:53:46 -07:00
Amog Kamsetty
41aaf78274
[Hotfix/ML] Don't run ML tests in Windows (#27100)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Tests for upstream ML libraries (xgboost-ray, ray-lightning, etc.) were recently refactored from ray.util to ray.tests. This caused them to be run in Windows CI, but Windows CI does not have any ML dependencies. This PR disables these tests from being run on Windows, which matches the previous behavior.
2022-07-27 13:45:00 -07:00
Simon Mo
e5a8b1dd55
[Serve] Add API Annotations And Move to _private (#27058) 2022-07-27 09:08:26 -07:00
Dmitri Gekhtman
c4a259828b
[kuberay] Update KubeRay operator commit, turn autoscaler RPC drain back on (#27077)
This PR:

- Updates the KubeRay operator commit used in the Ray CI autoscaling test
- Uses the RayCluster autoscaling sample config from the KubeRay repo in place of of a config from the Ray repo
- Turns the autoscaler RPC worker drain back on, as I saw some dead node messages from the GCS, and the RPC drain is supposed to avoid those.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-07-27 00:00:51 -07:00
Amog Kamsetty
aa8a7dcb48
[Docker] Add Cuda 11.6 support (#26695)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Latest Pytorch version has wheels for CUDA 11.6. Per user request, adding a 11.6 image as part of our build pipeline.
2022-07-26 10:15:53 -07:00
matthewdeng
794a81028b
[ci] add repro-ci-requirements.txt (#26951)
Adding a requirements file to make it easier to setup you environment to run `repro-ci.py`.

**Usage:**
```bash
pip install -r ci/repro-ci-requirements.txt
python ci/repro-ci.py [args]
```

Signed-off-by: Matthew Deng <matt@anyscale.com>
2022-07-25 14:09:48 +01:00
Yi Cheng
0c16619475
[core] Make ray able to connect to redis without pip redis. (#25875)
Signed-off-by: Yi Cheng <chengyidna@gmail.com>

## Why are these changes needed?
Right now, only cpp layer in ray is connecting to redis which means we don't need pip redis to connect to a redis db.

The blocking part is that we are doing some sharding in redis right now. But this feature is not actually used and the shard is always 1. So to make things simple, this feature is just disabled.

Test is added to make sure we can start ray with a redis db without pip redis.
2022-07-24 14:11:30 -07:00
Kai Fricke
7f908c4086
Revert "[ci] fix determine_tests_to_run.py by finding merge base (#26790)" (#26799)
This reverts commit 7cf4f8e069.
2022-07-20 22:12:13 +01:00
Kai Fricke
7cf4f8e069
[ci] fix determine_tests_to_run.py by finding merge base (#26790)
We previously found all differences between origin/master and the PR branch, which includes forward changes if origin/master is ahead of the branch. By finding file differences with respect to the merge base we will only include actually changed files.

See https://matthew-brett.github.io/pydagogue/git_diff_dots.html

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-20 19:39:22 +01:00
Riatre
591cd22be7
Revert "Revert "Bump pytest from 5.4.3 to 7.0.1"" (#26525)
* Revert "Revert "Bump pytest from 5.4.3 to 7.0.1""

This reverts commit ab10890e90.

Signed-off-by: Riatre Foo <foo@riat.re>

* Fix missing test data files dependency in rllib/BUILD

See # 26334 and # 26517 for context.

Once this is in, it should be good to roll-forwrad again.

Signed-off-by: Riatre Foo <foo@riat.re>

* debug: run all tests

Signed-off-by: Riatre Foo <foo@riat.re>

* Revert "debug: run all tests"

This reverts commit 0c5e796b0eb437d64922f66749c61b0412486970.

Signed-off-by: Riatre Foo <foo@riat.re>

* fix new tests since last rebase

Signed-off-by: Riatre Foo <foo@riat.re>
2022-07-18 21:21:19 -07:00
Archit Kulkarni
c747dd1b70
[Serve] [CI] Skip serve:test_standalone2 on Windows (#26668) 2022-07-18 14:39:36 -07:00
Dmitri Gekhtman
c4160ec34b
[autoscaler][weekend nits] autoscaler.py type checking and other lint issues (#26646)
I run several linters, including mypy, in my local environment.
This is a PR of style nits for autoscaler.py meant to silence my linters.

This PR also adds a mypy check for autoscaler.py
2022-07-18 15:27:19 -05:00
Jiajun Yao
1b2b526a2b
Fix windows buildkite (#26615)
- Stop using dot command to run ci.sh script: it doesn't fail the build if the command fails for windows and is generally dangerous since it will make unexpected changes to the current shell.
- Fix uncovered windows build issues.
2022-07-18 09:15:49 -07:00
Philipp Moritz
081bbfbff1
[Examples] Test OCR example in documentation tests (#26482)
Make sure the OCR example is tested in documentation after we discovered that example notebooks are not tested in CI.

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
2022-07-16 10:51:28 -07:00
Sihan Wang
09a6e5336a
[Serve][Part2] Migrate the tests to use deployment graph api (#26507) 2022-07-15 15:48:43 -07:00
Sven Mika
ab10890e90
Revert "Bump pytest from 5.4.3 to 7.0.1" (breaks lots of RLlib tests for unknown reasons) (#26517) 2022-07-13 11:19:30 -07:00
Riatre
2cdb76789e
Bump pytest from 5.4.3 to 7.0.1 (#26334)
See #23676 for context. This is another attempt at that as I figured out what's going wrong in `bazel test`. Supersedes #24828.

Now that there are Python 3.10 wheels for Ray 1.13 and this is no longer a blocker for supporting Python 3.10, I still want to make `bazel test //python/ray/tests/...` work for developing in a 3.10 env, and make it easier to add Python 3.10 tests to CI in future.

The change contains three commits with rather descriptive commit message, which I repeat here:

Pass deps to py_test in py_test_module_list

    Bazel macro py_test_module_list takes a `deps` argument, but completely
    ignores it instead of passes it to `native.py_test`. Fixing that as we
    are going to use deps of py_test_module_list in BUILD in later changes.

    cpp/BUILD.bazel depends on the broken behaviour: it deps-on a cc_library
    from a py_test, which isn't working, see upstream issue:
    https://github.com/bazelbuild/bazel/issues/701.
    This is fixed by simply removing the (non-working) deps.

Depend on conftest and data files in Python tests BUILD files

    Bazel requires that all the files used in a test run should be
    represented in the transitive dependencies specified for the test
    target. For py_test, it means srcs, deps and data.

    Bazel enforces this constraint by creating a "runfiles" directory,
    symbolic links files in the dependency closure and run the test in the
    "runfiles" directory, so that the test shouldn't see files not in the
    dependency graph.

    Unfortunately, the constraint does not apply for a large number of
    Python tests, due to pytest (>=3.9.0, <6.0) resolving these symbolic
    links during test collection and effectively "breaks out" of the
    runfiles tree.

    pytest >= 6.0 introduces a breaking change and removed the symbolic link
    resolving behaviour, see pytest pull request
    https://github.com/pytest-dev/pytest/pull/6523 for more context.

    Currently, we are underspecifying dependencies in a lot of BUILD files
    and thus blocking us from updating to newer pytest (for Python 3.10
    support). This change hopefully fixes all of them, and at least those in
    CI, by adding data or source dependencies (mostly for conftest.py-s)
    where needed.

Bump pytest version from 5.4.3 to 7.0.1

    We want at least pytest 6.2.5 for Python 3.10 support, but not past
    7.1.0 since it drops Python 3.6 support (which Ray still supports), thus
    the version constraint is set to <7.1.

    Updating pytest, combined with earlier BUILD fixes, changed the ground
    truth of a few error message based unit test, these tests are updated to
    reflect the change.

    There are also two small drive-by changes for making test_traceback and
    test_cli pass under Python 3.10. These are discovered while debugging CI
    failures (on earlier Python) with a Python 3.10 install locally.  Expect
    more such issues when adding Python 3.10 to CI.
2022-07-12 21:14:35 -07:00
Dmitri Gekhtman
8f8f036957
[autoscaler][kuberay] Deflake KubeRay autoscaling test (#26411)
Improves stability of KubeRay autoscaling test.
2022-07-12 00:56:36 -07:00
Amog Kamsetty
b01e11d721
[Docker] Add support for Cuda 11.3 (#26233)
Start building Ray docker images with cuda 11.3
2022-07-10 21:50:42 -07:00
Tao Wang
49cafc6323
[Cpp worker][Java worker]Support Java call Cpp Actor (#25933) 2022-06-29 14:33:32 +08:00
Simon Mo
5342163b9d
[CI] Use BUILDKITE_JOB_ID for better navigation for flaky tracker (#26021) 2022-06-23 18:07:29 -07:00
Kai Fricke
b0e1cfbcaa
[ci] repro-ci.py: Use Name tag instead of repo_name (#26035) 2022-06-23 14:12:45 -07:00
Eric Liang
43aa2299e6
[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695)
Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.
2022-06-21 15:13:29 -07:00
clarng
2b270fd9cb
apply isort uniformly for a subset of directories (#25824)
Simplify isort filters and move it into isort cfg file.

With this change, isort will not longer apply to diffs other than to files that are in whitelisted directory (isort only supports blacklist so we implement that instead) This is much simpler than building our own whitelist logic since our formatter runs multiple codepaths depending on whether it is formatting a single file / PR / entire repo in CI.
2022-06-17 13:40:32 -07:00
Simon Mo
1c27469b6d
[macOS] Only cleanup directory after upload (#25835)
Missed it in previous enablement of uploading bazel log, we should no longer clean the directory anymore.
2022-06-17 12:46:37 +01:00
matthewdeng
383954fb15
[CI] upgrade to go 1.18 (#25829)
upgrade go 1.18 to fix ci linter issue.
2022-06-15 17:31:07 -07:00
Antoni Baum
91dd360f9d
[AIR/train] Move predictors to ray.train (#25769) 2022-06-15 17:02:15 -07:00
clarng
1a5f42742d
import sort rest of autoscaler (#25796)
Continue to import sort the rest of autoscaler.
2022-06-15 15:00:21 -07:00
clarng
ef866d1e49
exclude doc_code from import sorting (#25772)
Skip sorting the imports in doc_code.
2022-06-15 11:34:45 -07:00
Simon Mo
503e197f8c
[CI] Upload macOS bazel test files (#25744) 2022-06-15 10:09:04 -07:00
Matti Picus
e5c5275bed
[Runtime Env] enable conda runtime creation in workers on windows (#23613) 2022-06-14 10:24:02 -07:00
clarng
73e113152b
Add import sorting to format.sh (#25678)
It will be easier to develop if we could use a tool to organize / sort imports and not have to move them around by hand.

This PR shows how we could do this with isort (black doesn't quite do this per https://github.com/psf/black/issues/333)

After this PR lands everyone will need to update their formatter to include isort if they don't have it already, i.e.

   pip install -r ./python/requirements_linters.txt 

All future file changes will go through isort and may introduce a slightly larger PR the first time as it will clean up the imports. 

The plan is to land this PR and also clean up the rest of the code in parallel by using this PR to format the codebase (so people won't get surprised by the formatter if the file hasn't been touched yet)

Co-authored-by: Clarence Ng <clarence@anyscale.com>
2022-06-13 14:08:51 -07:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train (#25570) 2022-06-08 21:34:18 -07:00
Kai Fricke
aa142eb377
[RLlib; CI] Add team:rllib tag for Bazel. (#25589)
Currently, team:ml spans all ML (Tune, Train, AIR) tests and rllib tests. rllib tests are much more flaky and it would be good to split them up in the flaky test tracker. This PR changes Rllib-tests from team:ml to team:rllib to enable this separation.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 22:25:59 +01:00
Dmitri Gekhtman
5cc2e15a1f
[CI][minor] Disallow filters if command isn't specified (#25593)
Trivial "developer experience" tweak to the ci repro script:
disallow filtering commands if we're not running the commands.
2022-06-08 20:52:51 +01:00
Amog Kamsetty
80ae651f25
[Train] Clean up ray.train package (#25566) 2022-06-08 10:22:36 -07:00
Antoni Baum
3876fcdbe8
[CI] Add bazel py_test checking for Serve (#25509) 2022-06-07 10:54:10 -07:00
Eric Liang
c1afbcb6f4
[air] Enforce API stability annotations for AIR module (#25485) 2022-06-06 22:52:21 -07:00
Jiao
aa965ba0a9
[Deployment Graph] Add visualization cookbook (#25112) 2022-06-06 11:05:58 -07:00