This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/).
Co-authored-by: Michael Cullan <mjcullan@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Currently, the docs have an [end-to-end tutorial](https://web.archive.org/web/20211122152843/https://docs.ray.io/en/latest/serve/tutorial.html) walking users through deploying a `Counter` function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way:
- [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign.
- [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).
This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).
I updated this version compatibility table on the release branch but didn't update it on master. This is my mistake, the process is to make a PR to master and then cherry pick that commit to the release branch.
Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.
Following #18987 this PR adds a docker-compose based local multi node cluster.
The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements.
This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.
This PR finishes most of the stats todos for dataset. The main thing punted for future work is instrumentation of split(), which is particularly tricky since only certain blocks are transformed.
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
Update ray lightning api docs to reflect new changes in ray lightning master.
Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this.
Closes#21426
This PR adds documentation for Workflow Metadata, which we recently added support in https://github.com/ray-project/ray/pull/19372.
Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well.
This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`.
* updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication
* adding azure sdk rqmts for tests
* updating azure test requirements and adding wrapper function for azure sdk function resolution
* adding docstring to get_azure_sdk_function
Co-authored-by: Scott Graham <scgraham@microsoft.com>
Added hyperameters to the concetp section since it's important to explain what they are and added diagrams help readeer visualize the difference between model and hyperparameters
Signed-off-by: Jules S.Damji <jules@anyscale.com>
Co-authored-by: Jules S.Damji <jules@anyscale.com>
Signed-off-by: Jules S.Damji jules@anyscale.com
Why are these changes needed?
The code snippet referenced a python function that was not defined, therefore the code snippet as is won't work. All complete or self-contained code in our docs should run.
The changes made were adding the undefined function, iterating over a list of different random large arrays to show the difference between local or distributed sort's execution time, and print them.
Closes#20960
This adds an initial Dataset.stats() framework for debugging dataset performance. At a high level, execution stats for tasks (e.g., CPU time) are attached to block metadata objects. Datasets have stats objects that hold references to these stats and parent dataset stats (this avoids stats holding references to parent datasets, allowing them to be gc'ed). Similarly, DatasetPipelines hold stats from recently computed datasets.
Currently only basic ops like map / map_batches are instrumented. TODO placeholders are left for future PRs.
Right now in ray, a lot of edge cases related to grpc are not tested. This PR is just a simple try to give the developer some way to delay grpc request. It could be used with manual testing and also e2e test since it's supporting delay for specific grpc method.
To use this feature, just simple set os env `RAY_TESTING_ASIO_DELAY_US="method1=10:20,method2=20:30,*=200:200"`
This means, for `method1` it'll delay 10-20us, for method2 it'll delay 20-30us. For all the rest, it'll delay 200us.