Commit graph

7220 commits

Author SHA1 Message Date
Stephanie Wang
293c122302
[dataset] Use polars for sorting (#25454) 2022-06-17 12:26:46 -07:00
Clark Zinzow
c2ab73fc40
[Datasets] Add ray_remote_args to read_text. (#23764) 2022-06-17 12:24:11 -07:00
Archit Kulkarni
85be093a84
[runtime env] Make all plugins return a List of URIs (#25825)
Followup from #24622.  This is another step towards pluggability for runtime_env.  Previously some plugin classes had `get_uri` which returned a single URI, while others had `get_uris` which returned a list.  This PR makes all plugins use `get_uris`, which simplifies the code overall.

Most of the lines in the diff just come from the new `format.sh` which sorts the imports.
2022-06-17 14:13:44 -05:00
Stephanie Wang
09857907b7
[data] Fix bug in computing merge partitions in push-based shuffle (#25865)
Fixes a bug in push-based shuffle in computing the merge task <> reduce tasks mapping when the number of reduce tasks does not divide evenly by the number of merge tasks. Previously, if there were N reduce tasks for one merge task, we would do:
[N + 1, N + 1, ..., N + 1, all left over tasks]
which could lead to negative many reduce tasks n the last merge partition.

This PR changes it to:
[N + 1, N + 1, ..., N + 1, N, N, N, ...]
Related issue number

Closes #25863.
2022-06-17 10:19:00 -07:00
Alex Wu
187c21ce20
[gcs] Preserve job driver info for dashboard (#25880)
This PR ensures that GCS keeps the IP and PID information about a job so that it can be used to find the job's logs in the dashboard after the job terminates.

@alanwguo will handle any dashboard work in a separate PR.

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-06-17 09:03:20 -07:00
Kai Fricke
40a9fdcb0f
[tune/air] Fix checkpoint conversion for objects (#25885)
Converting Tracked memory checkpoints was faulty and untested.
2022-06-17 10:41:52 +01:00
Siyuan (Ryans) Zhuang
fea8dd08fc
[workflow] Enhance dataset tests (#25876) 2022-06-16 22:50:31 -07:00
yuduber
26b2faf869
[data] add retry logic to ray.data parquet file reading (#25673) 2022-06-16 21:49:41 -07:00
Jiao
f6735f90c7
[Ray DAG] Move dag project folder out of experimental (#25532) 2022-06-16 19:15:39 -07:00
Clark Zinzow
e111b173e9
[Datasets] Workaround for unserializable Arrow JSON ReadOptions. (#25821)
pyarrow.json.ReadOptions are not picklable until Arrow 8.0.0, which we do not yet support. This PR adds a custom serializer for this type and ensures that said serializer is registered before each Ray task submission.
2022-06-16 18:33:59 -07:00
Stephanie Wang
977fff16a6
Set object spill config from env var (#25794)
Previously it was not possible to set the object spill config from the RAY_xxx environment variable, unlike other system configs. This is because the config is initialized in Python first, before the C++ config is parsed.
2022-06-16 16:08:19 -07:00
Antoni Baum
2120d3ea09
[AIR] Change the name of GBDTTrainable dynamically (#25804) 2022-06-16 13:54:09 -07:00
Simon Mo
d83773b2f1
[Serve][AIR] Support mixed input and output type, with batching (#25688) 2022-06-16 13:00:29 -07:00
Clark Zinzow
04280d6e4e
[Datasets] Preserve cached block metadata on LazyBlockList splits. (#25745)
Preserves cached block metadata on LazyBlockList splits. Before this PR, after these splits, all block metadata would have to be re-fetched.
2022-06-16 12:36:25 -07:00
Clark Zinzow
d98adbc448
[Datasets] Fix tensor extension string formatting (repr). (#25768)
Fixes tensor extension string formatting, e.g. when invoking the DataFrame repr.
2022-06-16 12:35:11 -07:00
Kai Fricke
9b052d220e
[tune] Fix checkpoint deletion for custom syncers (#25859)
Deleting checkpoints with custom syncers was faulty and untested before this PR.
2022-06-16 19:53:43 +01:00
Kai Fricke
c4590f3ab5
[air] Convert _TrackedCheckpoint to ray.air.Checkpoint (#25849)
We often need to convert our internal `_TrackedCheckpoint` objects to `ray.air.Checkpoint`s - we should do this in a utility method in the `_TrackedCheckpoint` class.
This PR also fixes cases where we haven't resolved the saved checkpoint futures, yet.
2022-06-16 19:31:18 +01:00
Jimmy Yao
b2e9aea908
[Ray dataset] detect dataframe dtype as object (#25811)
* fix ci

* not break master
2022-06-16 11:23:03 -07:00
Matti Picus
e275e8b0e7
WINDOWS: replace ':' with '$' for filename (#25767)
On windows, creating a file with a ':' in the name will fail. However '$' is fine.
2022-06-16 09:38:45 -07:00
Edward Oakes
e4352305dd
Revert "[serve] Use soft constraint for pinning controller on head node (#25091)" (#25857)
This reverts commit 0f600362dd.
2022-06-16 11:16:20 -05:00
Yi Cheng
4c5c5763ef
[ci][core] Add option for parallel ci for ray core tests (#25801)
This is the first step to enable the parallel tests for ray core ci. To reduce the noise this test only add the option and not enable them. Parallel CI can be 40%-60% faster compared with running them one-by-one.

We'll enable them by bk jobs one-by-one.

Prototype here #25612
2022-06-15 22:46:50 -07:00
SangBin Cho
5fb61abba3
[Usage Stats][Hotfix] Import usage reported from workers. (#25785)
## Why are these changes needed?

We currently only record usage stats from drivers. This can lose some of information when libraries are imported from workers (e.g., doing some rllib import from trainable).

@jjyao just for the future reference.
2022-06-15 18:20:12 -07:00
Antoni Baum
91dd360f9d
[AIR/train] Move predictors to ray.train (#25769) 2022-06-15 17:02:15 -07:00
Yi Cheng
5d77d2b160
[core][gcs] Fix the issue when gcs restarts, actor is destroyed due to bundle index equals -1 (#25789)
When GCS restarts, it'll recover the placement group and make sure no resource is leaking. The protocol now is like:

- Sending the committed PGs to raylets
- Raylets will check whether any worker is using resources from the PG not in this group
- If there is any, it'll kill that worker.

Right now there is a bug, which will kill the worker using bundle index equals -1.
2022-06-15 16:57:22 -07:00
Yi Cheng
bcb8ae9fbd
[core] Enable test_get_locations.py (#25814)
This test was skipped accidentally. This PR enabled it.
2022-06-15 16:47:09 -07:00
Edward Oakes
0f600362dd
[serve] Use soft constraint for pinning controller on head node (#25091)
Un-reverting https://github.com/ray-project/ray/pull/24934 which caused `test_cluster` to become flaky. This was due to an oversight: we need to update the `HTTPState` logic to account for the controller not necessarily running on the head node.

This will require using the new `SchedulingPolicy` API, but I'm not quite sure the best way to do it. Context here: https://github.com/ray-project/ray/issues/25090.
2022-06-15 17:52:20 -05:00
Clark Zinzow
b51b777aae
[CI] [Datasets] [RayDP] Skip failing RayDP integration tests. (#25818)
Current causing the master Datasets CI job to fail due to a hard dependency on MLDataset, which has been deleted in Ray master. See #25816.
2022-06-15 15:20:52 -07:00
clarng
1a5f42742d
import sort rest of autoscaler (#25796)
Continue to import sort the rest of autoscaler.
2022-06-15 15:00:21 -07:00
Archit Kulkarni
23030dbcaa
[runtime env] Hide URI cache behind class (#24622)
Followup PR to https://github.com/ray-project/ray/pull/20273.

- Hides cache logic behind a class.
- Adds "name" field to runtime env plugin class and makes existing conda, pip, working_dir, and py_modules inherit from the plugin class. 

Future work will unify the codepath for these "base plugins" with the codepath for third-party plugins; currently these are different, and URI support is missing for third-party plugins.
2022-06-15 16:14:06 -05:00
Antoni Baum
090024c297
[AIR] Fix FailureConfig not being a dataclass (#25807) 2022-06-15 13:46:51 -07:00
Robert
b4d85a2c8a
[RuntimeEnv] Fixes spaces in paths causing failures on Windows (#25659)
This is a follow-up to the previous PR (GitHub did some funky things when I did a rebase, so I had to create a new one)

On Windows systems, the `exec_worker` method may fail due to spaces being present in arguments that are file paths. This addresses said issue.
2022-06-15 15:22:17 -05:00
Clark Zinzow
526e12074a
[Datasets] Make it clear that read_parquet() does not support multiple directories. (#25747)
Unfortunately, ray.data.read_parquet() doesn't work with multiple directories since it uses Arrow's Dataset abstraction under-the-hood, which doesn't accept multiple directories as a source: https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html

This PR makes this clear in the docs, and as a driveby, adds ray.data.read_parquet_bulk() to the API docs.
2022-06-15 13:19:39 -07:00
Ian Rodney
7800172041
[AWS] Cleanup Naming/Typing of Boto3 resources/clients (#25731)
It's a bit hard to follow if these are clients or resources so add typing + rename a mis-named function.
2022-06-15 11:57:20 -07:00
Chen Shen
8982e4d78c
Revert "[Ray Dataset] fix the type infer of pd.dataframe (when dtype is object)" (#25809)
This reverts commit f61f60f708.
2022-06-15 11:20:14 -07:00
Stephanie Wang
68be44ade1
[datasets] Avoid unnecessary metadata serialization in Datasets shuffle (#25734)
Push-based shuffle has some extra metadata involving merge and reduce tasks. Previously we were serializing an O(n) (n = reduce tasks) metadata and sending this to tasks, which caused a lot of unnecessary plasma usage on the head node. This PR splits up the metadata into parts that can be kept on the driver and a relatively cheap part that is sent to all tasks.
Related issue number

One of the issues needed for #24480.
2022-06-15 10:33:52 -07:00
Jimmy Yao
f61f60f708
[Ray Dataset] fix the type infer of pd.dataframe (when dtype is object) 2022-06-15 08:11:49 -07:00
xwjiang2010
88d824d067
[air] remove fully_executed from Tune. (#25750) 2022-06-14 22:32:48 -07:00
Chen Shen
4ecfa9374d
Revert "[Ray Dataset] fix the type infer of pd.dataframe (when dtype is object.) (#25563)" (#25790)
This reverts commit 57d02eec2e.
2022-06-14 20:46:40 -07:00
Antoni Baum
11c556f887
[Train] Remove bad arg from SklearnTrainer doc (#25773)
Removes docstring for an argument that is not present. Looks like it was introduced by mistake.
2022-06-14 19:29:49 -07:00
shrekris-anyscale
a371756b3c
[Serve] Update Serve CLI and REST API behavior to use new config (#25691) 2022-06-14 19:01:51 -07:00
clarng
badf444eda
Respect import order for psutil and setproctitle (#25780)
Sort imports in a way that preserves the ordering requirements. This PR is needed for any file changes that imports psutil or setproctitle.
2022-06-14 17:44:41 -07:00
Antoni Baum
067a244c84
[AIR] Arrow support for preprocessors (#25623)
Adds a _transform_arrow method to Preprocessors that allows them to implement logic for arrow-based Datasets.

- If only _transform_arrow is implemented, will convert the data to arrow.
- If only _transform_pandas is implemented, will convert the data to pandas.
- If both are implemented, will pick the method corresponding to the format for best performance.
Implementation is defined as overriding the method in a sub-class.

This is only a change to the base Preprocessor class. Implementations for sub-classes will come in the future.
2022-06-14 16:48:31 -07:00
Jimmy Yao
5f6f2d9f29
[AIR] Tf end2end CV example (#25070) 2022-06-14 16:24:38 -07:00
Sihan Wang
d4aa7691e9
[Serve] Add compact for InMemoryMetricsStore max function (#25770) 2022-06-14 13:09:30 -07:00
Jimmy Yao
57d02eec2e
[Ray Dataset] fix the type infer of pd.dataframe (when dtype is object.) (#25563)
this is a temp fix of #25556. When the dtype from the pandas dataframe gives object, we set the dtype to be None and make use of the auto-inferring of the type in the conversion.
2022-06-14 12:49:04 -07:00
Archit Kulkarni
0d8cbb1cae
[runtime env] Skip content hash for unopenable files (#25413) 2022-06-14 12:07:51 -07:00
Matti Picus
e5c5275bed
[Runtime Env] enable conda runtime creation in workers on windows (#23613) 2022-06-14 10:24:02 -07:00
Sihan Wang
c92628138b
[Serve] Disable background thread of handle without autoscaling (#25733) 2022-06-14 10:04:27 -07:00
Mark
1feb702327
Create RAY_TMPDIR if it doesn't exist (#25577)
This will prevent FileNotFoundErrors on fresh `ray up` local node provider installs.

Co-authored-by: Mark Flanagan <>
2022-06-14 09:11:31 -07:00
Kai Fricke
d5541cccb1
[air] Use predict_pandas in xgboost, lightgbm, rl, huggingface, sklearn (#25759)
Switching to the _predict_pandas API implementation for xgboost, lightgbm, rl, huggingface, and sklearn predictors.
2022-06-14 14:47:37 +02:00