Commit graph

11101 commits

Author SHA1 Message Date
Kai Fricke
214178bec7 Pin protobuf to < 4 (#25257)
Otherwise pip install ray will install latest protobuf which is incompatible

Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2022-06-01 10:21:44 +02:00
Archit Kulkarni
5c9b1003d5 Update Dask-on-Ray version 2022-04-11 14:20:09 -07:00
Archit Kulkarni
d2961e60da Bump version from 1.11.0 to 1.11.1 2022-04-11 14:19:08 -07:00
Archit Kulkarni
8a0a580eb3
Fix coreworker test (#23836)
* [HOTFIX]fix some compilation failures in core worker test (#22855)

There're some compilation failures in core worker test when we build project using `bazel build //:all`. It seems broken and not integrated in CI.

* Lint

Co-authored-by: Tao Wang <dooku.wt@antfin.com>
2022-04-11 14:07:26 -07:00
Amog Kamsetty
dd28d45261 [Train] MLflow start run under correct experiment (#23662)
Start Mlflow run under correct mlflow experiment
2022-04-11 09:56:44 -07:00
Archit Kulkarni
56a5007b84
[ci] Pin prometheus_client to fix current test outages (#23749) (#23776)
What: Pins prometheus_client to < 0.14.0, hopefully fixing today's CI outages
Why: New version of the python client (https://github.com/prometheus/client_python/releases) breaks our CI

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-04-08 10:36:03 -07:00
Archit Kulkarni
a1b9936e66
[Dashboard] Specify @types/react resolution (#23794) (#23795)
A new @types/react release has broken the dashboard build. Make sure to specify the older version under package resolutions.

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2022-04-08 10:35:52 -07:00
Kai Fricke
8a3e92b441
Cherry pick pin gym dependency (#23705)
* [release tests] Pin gym everywhere (#23349)

* [rllib] Pin gym everywhere (#23384)

This PR Pins gym in the app config.yaml's for rllib and tune so that release tests are no longer broken by the new gym version.

* [RLlib] Pin Gym Everywhere and turn off gpu for recsim tests (#23452)

* [ci] Clean up ray-ml requirements (#23325)

In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env.

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Fix merge conflict

* Also copy requirements_train.txt

Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
Co-authored-by: ddelange <14880945+ddelange@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2022-04-08 10:25:39 -07:00
simon-mo
3f663be57f lint for yapf 2022-03-24 17:08:21 -07:00
shrekris-anyscale
6f240c9c34 [Serve] [runtime env] Replace os.rename with shutil.move in remove_dir_from_filepaths() (#22018)
Currently, the `remove_dir_from_filepaths()` function uses `os.rename()` when shifting directories and files. This change replaces [`os.rename()`](https://docs.python.org/3/library/os.html#os.rename) with [`shutil.move()`](https://docs.python.org/3/library/shutil.html#shutil.move) to support these operations even when the directory's parent and the temporary directory are located on separate file systems.
2022-03-24 17:07:33 -07:00
Alex Wu
d3b62f3958
Promote python 3.9 support to stable (#22923)
Remove the experimental note from python 3.9 since it and its core dependencies have been stable for quite some time now.

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-03-09 11:00:27 -08:00
mwtian
fec30a25db
Operator does not retry monitor on failure. (#22792) (#22804)
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
2022-03-03 10:39:51 -08:00
mwtian
94b170256f
Fix K8s API (#22756) (#22776)
From original PR:
This PR fixes K8s support by updating the api client used for ingresses.

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
2022-03-02 12:31:06 -08:00
mwtian
c2ba5f41c3
[Release 1.111.0] Bump version from 1.11.0rc1 to 1.11.0 (#22699)
IIUC rc1 should be the final candidate. We would not need another release candidate.
2022-02-28 15:50:30 -08:00
Yi Cheng
99fa8424f0
[ci][cherry-pick] Fix grpcio 1.44 break test_output (#22494) (#22608)
This PR limit grpc to be <= 1.42. This will fix testoutput.
2022-02-23 16:43:07 -08:00
Yi Cheng
d2f8094684
[GCS-Ray][1.11.0 Cherry-pick] update doc and error message for GCS-Ray (#22596)
Update documentation to reflect that Ray no longer starts Redis by default.



Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2022-02-23 16:42:33 -08:00
Amog Kamsetty
c8e76f9b92
[rllib] Upper bound gym version (#22510)
gym had 0.22 release today which is breaking a lot of the rllib tests and examples. Temporarily pins gym version for now.
2022-02-18 18:04:54 -08:00
Max Pumperla
c0afd1cd03
[docs] rllib conference material (#22503)
* merge

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

* [Docs] Executable notebook tutorial (#22030)

We're introducing the usage of [MyST Notebooks](https://myst-nb.readthedocs.io/en/latest/index.html) here and demonstrate how it works by rewriting (and extending) the RLLib Serve tutorial. Benefits:

- [x] Write notebooks in markdown. Can be converted into other formats e.g. with `jupytext`
- [x] Tutorials like this have a binderhub link added to the top nav (launch button).
- [x] Notebooks get executed when docs are built, so it's impossible to have stale docs.
- [x] But locally those builds are cached so that you don't have to wait too long.
- [x] The notebook cell outputs can be shown, hidden or removed.  In particular, we can now avoid adding expected code output as comments in our scripts (which might get outdated).

We're also clarifying  #22022.

Old tutorial: [here](https://docs.ray.io/en/latest/serve/tutorials/rllib.html)
New tutorial (preview): [here](https://ray--22030.org.readthedocs.build/en/22030/serve/tutorials/rllib.html)

Co-authored-by: simon-mo <simon.mo@hey.com>

* lint

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

Co-authored-by: simon-mo <simon.mo@hey.com>
2022-02-18 18:00:00 -08:00
mwtian
9ffa4c5652
[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277) (#22500)
* pin pyarrow==4.0.1

* address comments

Co-authored-by: Chen Shen <scv119@gmail.com>
2022-02-18 11:49:05 -08:00
mwtian
3c803dc85d
[Release 1.11.0] Release logs for 1.11.0rc1 (#22443)
This is the release log for 1.11.0rc1, with GCS-Ray enabled. The diff is against 1.11.0rc0, without GCS-Ray.
2022-02-16 20:36:57 -08:00
Max Pumperla
e5c35b3355
[docs] 1.11. release cherry pick (#22430)
* [docs] landing page (fixes #21750) (#21859)

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* [Doc] Fix bad doc and recover doc of c++ api (#22213)

* [Docs] Ray Data docs target state (#21931)

Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)

The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have

- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.

This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.

* [Datasets] [Docs] Datasets library branding + positioning tweaks (#22067)

* [train] Minor fixes on Ray Train user guide doc (#22379)

Fixes some typos and format issues.

* [Doc] Fix bad links of dask and mars in ray-libraries.rst (#22210)

* merge

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

* merge

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

* merge

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

* [docs] Clean up long titles in TOC (#22016)

* LINT

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Guyang Song <guyang.sgy@antfin.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-02-16 17:03:19 -08:00
mwtian
22b8f7d6da
[jobs] Use subprocess.list2cmdline to properly handle quotes in CLI entrypoints (#22011) (#22447)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-02-16 16:18:03 -08:00
Mingwei Tian
4ddd71a9bf
[Release] update version to 1.11.0rc1 2022-02-15 15:28:48 -08:00
mwtian
7d28336974
[Release branch] support GCS-Ray in e2e.py (#22407)
* [e2e] Update e2e test to use redisless ray by default. (#22189)

As title, after infra got updated, we need to merge the PR so that test can run ray without redis.

* [e2e] Fix an error when "env_vars" is not set. (#22234)

To fix error in session https://buildkite.com/ray-project/periodic-ci/builds/2699#c532ed2b-ee89-48ad-a7db-fd4211ef8bd9

Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2022-02-15 15:03:15 -08:00
Yi Cheng
a47240fa0a
[gcs/ha] Enable HA flags by default (#21608) (#22366) 2022-02-15 11:38:32 -08:00
Yi Cheng
4284ec7f42
Prep K8s operator for the Ray 1.11.0 release. (#22264) (#22365)
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
2022-02-15 10:42:27 -08:00
Yi Cheng
7b47aa8d2a
[gcs] Fix in_memory_store not handling nullptr callback issue (#22321) (#22364)
in memory store is not handling the nullptr callback well which leads to gcs crash in node failure tests. This PR fixed it.
2022-02-15 10:42:07 -08:00
mwtian
c857f34cc7
Fix building Windows wheels (#22388) 2022-02-15 08:17:34 -08:00
mwtian
6882ca0717
[Release branch] fix rllib test_catalog.py (#22387) 2022-02-15 16:54:56 +01:00
Clark Zinzow
d51df512eb
[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. (#22358)
Reformatted cherry-pick of 443416907e.

This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.
2022-02-14 11:45:50 -08:00
Edward Oakes
c48ad5cf13
[serve] Fix HTTP proxy controller namespace bug (#22287) (#22355)
Closes https://github.com/ray-project/ray/issues/22265

This was caused by implicitly inferring the namespace from within the HTTP proxy when calling `get_handle`. This makes me think we really need to simplify the namespace handling logic.
2022-02-14 11:33:06 -08:00
Mingwei Tian
fee8947c23
[Release branch] Update Python version to 1.11.0rc0 2022-02-14 10:05:53 -08:00
mwtian
49b0d4d88f
[Release] Add release logs for 1.11.0rc0 (GCS KV & pubsub not enabled) (#22041) 2022-02-14 10:05:53 -08:00
Chen Shen
a847fa3643
[Dataset] avoid pyarrow 7.0.0 for dataset (#22253) (#22330) 2022-02-14 08:06:11 -08:00
Archit Kulkarni
789274c179
[runtime env] [1.11.0 release cherry-pick] fix bug where pip options don't work in requirements.txt (#22127)
* [runtime env] Fix bug where options (e.g. `--extra-index-url`) could not be specified in `requirements.txt` (#22065)

In https://github.com/ray-project/ray/pull/20341 the behavior of `pip` was changed to install the specified packages in the existing environment rather than in a new environment.  This posed a problem when specifying Ray libraries like "ray[serve]" in the `pip` field, because the installer would install Ray at runtime and this new Ray would take precedence over the Ray existing on the cluster.  This could cause version mismatch issues.  Skipping some details, the approach taken in the that PR was essentially to parse the `pip` list and remove Ray.

However not every line in a `pip` `requirements.txt` file is a requirements specifier; a line can also just specify options, like `--extra-index-url my-index-url.com`.
 This caused the parsing library to raise an exception when trying to parse the line.  This PR fixes this by catching the exception and skipping the line in this case, since it's not a line that specifies `ray` and that's all we're looking for when parsing.

* lint using old linter from pre-1.11.0-branch-cut
2022-02-14 07:13:37 -08:00
Kai Fricke
c500c5b1ed
[ci/release] Fix job submission command (#22093)
Ray job submission does not accept quoted commands anymore (#22011). This PR updates the command to fix job submission within e2e tests.
2022-02-13 20:08:03 -08:00
mwtian
2b257189a1
[Release 1.11 Cherrypick] [e2e] do not terminate in serve_failure smoke test (#21955)
Original PR #21925

This makes `serve_failure` pass its smoke test step. Without it, the test fails early and does not get to exercise the logic for 24 hr.
2022-01-28 20:19:18 -08:00
mwtian
e23b27c173
[ci/release] Increase long running timeout, fix artifacts copy (#21905) (#21943)
With the new job-based file copy, fetching results takes longer. We thus have to increase the long running update test check times in order not to run into bogus release test failures.
Also fixes artifact uploading issues.

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-01-27 22:30:43 -08:00
mwtian
32cf407fc4
[release] Fix broken pip_download_test.sh script for non-M1 Macs (#21542) (#21944)
Fixes a typo that caused the script to exit early without running any sanity checks when not using an M1 Mac.

Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
2022-01-27 22:30:28 -08:00
mwtian
413447cbdf
[Doc] update dask version for Ray 1.11.0 (#21933) (#21942)
This is needed for release 1.11.0.
2022-01-27 22:29:52 -08:00
Mingwei Tian
c80aceaf0a
Update version to 1.11.0rc0 2022-01-25 20:51:13 -08:00
Lingxuan Zuo
0c33ff718d
Remove generated streaming pb and pom files. (#21851)
There are some auto-generated streaming files, which are not removed. This PR removes them totally.

Co-authored-by: 林濯 <lingxuzn.zlx@antgroup.com>
2022-01-26 10:05:23 +08:00
Alex Wu
7a45f60dbc
[autoscaler] Fix ray.autoscaler.sdk import issue (#21795)
This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path. 

Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way.

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-01-25 14:43:24 -08:00
Wilson Wang
30a4761592
Two issues fix for GCS connecting logic in monitor.py and log_monitor.py (#21790)
This patch fixed two issues.

1. log_monitor.py can crash when gcs is not temporarily available. Added retry logic in gcs_pubsub.py.
2. it is possible that the signal handler can raise another exception during exception handling.
2022-01-25 14:07:26 -08:00
Ian Rodney
257bd2d1e7
[Cleanup] Use mkstemp (#21676)
`tempfile.mktemp` is technically deprecated in favor of `tempfile.mkstemp`. 
Ref: https://docs.python.org/3/library/tempfile.html#deprecated-functions-and-variables.
2022-01-25 13:42:12 -08:00
shrekris-anyscale
e4370720cc
[Serve] Add "Serve" team tag to untagged release tests (#21861) 2022-01-25 11:46:03 -08:00
Dhruv Nair
3d79815cd0
Comet Integration (#20766)
This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/).

Co-authored-by: Michael Cullan <mjcullan@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-01-25 11:42:00 -08:00
Clark Zinzow
1971a08b7d
[RFC] [Core] Support disabling log redirection via RAY_LOG_TO_STDERR environment variable. (#21767) 2022-01-25 10:52:53 -08:00
Gagandeep Singh
395297a9bd
Unskip tests for Windows in test_output (#21775) 2022-01-25 09:25:01 -08:00
Matti Picus
d3d1e8559c
enable passing metric tests on windows (#21755)
Resubmitting #21705 which was merged then reverted. It seems somehow sphinx building broke in the meantime, not clear how it is connected to this PR.

Here is the original description:
>Part of the effort to enable tests on windows, this enables test_metrics and test_metric_agents, which pass locally.
2022-01-25 09:20:16 -08:00