Instead of using a detached lifetime, tie the lifetime of `_DesignatedBlockOwner` to the lifetime of the context creator. Also, only create a `_DesignatedBlockOwner` if dynamic block splitting is enabled.
This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`.
Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.
* basic reuse functionality without valid node filtering
* Filtering, logging, and formatting for cache_stopped_nodes on Azure
* Updated formatter version
We regularly run tasks where we know our expected resource requirements at launch, so call request_resources with the required number of cpus. The number of machines doesn't scale back down as our tasks are finishing, and just sit idle. This is costing more in aws hosting costs than necessary. Fix suggested is to not call request_resources and have a high upscaling_speed to instantly scale up to the required resources.
This PR is a minor adjustment to the K8s release tests.
Replace tasks with actors in scale test for reduced flakiness
Use an up-to-date Ray client API.
In some cases, we need to add custom fields in different code path. `SetCustomFields` will cover all the existing items, which leads to custom fields losing. This PR redefine `SetCustomFields` to `UpdateCustomFields `. `UpdateCustomFields ` could keep existing items and merge new items. If the key already exists, replace the value.
Passing tests: https://buildkite.com/ray-project/periodic-ci/builds/2560#_
Add an echo timestamp to the post build commands of the ray lightning release tests to trigger a cluster env rebuild and get the latest versions of ray lightning. Without this, the cluster env gets cached so an outdated version is installed on the cluster that is different than the one on the driver, resulting in the below failures.
Closes#21871Closes#21863
Also reinstalls the dependencies in the post build commands so old versions are not cached in the Docker images
When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.
Support hosting a serve instance under a path prefix.
Some clean-up should still be done for the different overlapping HttpOptions that now exist (host, port, root_path, root_url).
Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)
The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have
- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.
This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.
This is a simple refactoring change and my first PR in ray-project. This change moves an if statement outside of a loop. This way the check is not repeated for each iteration.
The WandbLoggingCallback is run on the driver side, with the experiment directory was the cwd. Using resume=True will pick up state from other trials (as the file name is global), and thus lead to warning messages. Thus, we should default to resume=False when using the callback.
This PR also incorporates changes from #20966.
Co-authored by: Queimo <queimo@gmx.net>
Co-authored by: Karim <karim.ben.hicham@rwth-aachen.de>
Try to clear the result dir before running the e2e.py script, to avoid failures where the directory already exists, or a file cannot be overwritten due to permission issue.
This PR fix the issue that sometimes FunctionsToRun is not executed. We isolated the Functions/Actors in function table, but not the RunctionsToRun. So when doing importing, sometimes, some functions will be missed.
This PR fixed this.
Currently, `ray stop` logic is vulnerable, and it kills Redis server that's not started by Ray. This PR fixes the issue by better checking the executable name of redis-server (If it is redis-server created by Ray, it should contain Ray specific path copied while wheels are built).
I originally tried to obtain ppid and kill a redis-server only when it is created from the same parent, but it turns out all processes started by ray start has no ppid.
While the best solution is to have some "process manager" that we can detect redis server started by us, I think there's no need to put lots of efforts here right now since Redis will be removed soon. We will eventually move to a better direction (process manager) to handle this sort of issues.