* [HOTFIX]fix some compilation failures in core worker test (#22855)
There're some compilation failures in core worker test when we build project using `bazel build //:all`. It seems broken and not integrated in CI.
* Lint
Co-authored-by: Tao Wang <dooku.wt@antfin.com>
What: Pins prometheus_client to < 0.14.0, hopefully fixing today's CI outages
Why: New version of the python client (https://github.com/prometheus/client_python/releases) breaks our CI
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
A new @types/react release has broken the dashboard build. Make sure to specify the older version under package resolutions.
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* [release tests] Pin gym everywhere (#23349)
* [rllib] Pin gym everywhere (#23384)
This PR Pins gym in the app config.yaml's for rllib and tune so that release tests are no longer broken by the new gym version.
* [RLlib] Pin Gym Everywhere and turn off gpu for recsim tests (#23452)
* [ci] Clean up ray-ml requirements (#23325)
In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env.
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
* Fix merge conflict
* Also copy requirements_train.txt
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
Co-authored-by: ddelange <14880945+ddelange@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Remove the experimental note from python 3.9 since it and its core dependencies have been stable for quite some time now.
Co-authored-by: Alex Wu <alex@anyscale.com>
From original PR:
This PR fixes K8s support by updating the api client used for ingresses.
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
* merge
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
* [Docs] Executable notebook tutorial (#22030)
We're introducing the usage of [MyST Notebooks](https://myst-nb.readthedocs.io/en/latest/index.html) here and demonstrate how it works by rewriting (and extending) the RLLib Serve tutorial. Benefits:
- [x] Write notebooks in markdown. Can be converted into other formats e.g. with `jupytext`
- [x] Tutorials like this have a binderhub link added to the top nav (launch button).
- [x] Notebooks get executed when docs are built, so it's impossible to have stale docs.
- [x] But locally those builds are cached so that you don't have to wait too long.
- [x] The notebook cell outputs can be shown, hidden or removed. In particular, we can now avoid adding expected code output as comments in our scripts (which might get outdated).
We're also clarifying #22022.
Old tutorial: [here](https://docs.ray.io/en/latest/serve/tutorials/rllib.html)
New tutorial (preview): [here](https://ray--22030.org.readthedocs.build/en/22030/serve/tutorials/rllib.html)
Co-authored-by: simon-mo <simon.mo@hey.com>
* lint
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
* [docs] landing page (fixes#21750) (#21859)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* [Doc] Fix bad doc and recover doc of c++ api (#22213)
* [Docs] Ray Data docs target state (#21931)
Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)
The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have
- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.
This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.
* [Datasets] [Docs] Datasets library branding + positioning tweaks (#22067)
* [train] Minor fixes on Ray Train user guide doc (#22379)
Fixes some typos and format issues.
* [Doc] Fix bad links of dask and mars in ray-libraries.rst (#22210)
* merge
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
* merge
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
* merge
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
* [docs] Clean up long titles in TOC (#22016)
* LINT
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Guyang Song <guyang.sgy@antfin.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
Reformatted cherry-pick of 443416907e.
This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.
Closes https://github.com/ray-project/ray/issues/22265
This was caused by implicitly inferring the namespace from within the HTTP proxy when calling `get_handle`. This makes me think we really need to simplify the namespace handling logic.
* [runtime env] Fix bug where options (e.g. `--extra-index-url`) could not be specified in `requirements.txt` (#22065)
In https://github.com/ray-project/ray/pull/20341 the behavior of `pip` was changed to install the specified packages in the existing environment rather than in a new environment. This posed a problem when specifying Ray libraries like "ray[serve]" in the `pip` field, because the installer would install Ray at runtime and this new Ray would take precedence over the Ray existing on the cluster. This could cause version mismatch issues. Skipping some details, the approach taken in the that PR was essentially to parse the `pip` list and remove Ray.
However not every line in a `pip` `requirements.txt` file is a requirements specifier; a line can also just specify options, like `--extra-index-url my-index-url.com`.
This caused the parsing library to raise an exception when trying to parse the line. This PR fixes this by catching the exception and skipping the line in this case, since it's not a line that specifies `ray` and that's all we're looking for when parsing.
* lint using old linter from pre-1.11.0-branch-cut
Original PR #21925
This makes `serve_failure` pass its smoke test step. Without it, the test fails early and does not get to exercise the logic for 24 hr.
With the new job-based file copy, fetching results takes longer. We thus have to increase the long running update test check times in order not to run into bogus release test failures.
Also fixes artifact uploading issues.
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Fixes a typo that caused the script to exit early without running any sanity checks when not using an M1 Mac.
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path.
Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way.
Co-authored-by: Alex Wu <alex@anyscale.com>
This patch fixed two issues.
1. log_monitor.py can crash when gcs is not temporarily available. Added retry logic in gcs_pubsub.py.
2. it is possible that the signal handler can raise another exception during exception handling.
This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/).
Co-authored-by: Michael Cullan <mjcullan@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Resubmitting #21705 which was merged then reverted. It seems somehow sphinx building broke in the meantime, not clear how it is connected to this PR.
Here is the original description:
>Part of the effort to enable tests on windows, this enables test_metrics and test_metric_agents, which pass locally.