hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 04:46:38 -04:00

Author	SHA1	Message	Date
Dmitri Gekhtman	836b08597f	[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator (#25386 ) This PR incorporates recent autoscaler config changes from KubeRay.	2022-06-08 20:09:43 -07:00
matthewdeng	ba0a2a022a	[datasets] add `Dataset.randomize_block_order` (#25568 ) This exposes a low-cost way to perform a pseudo global shuffle. For extremely large datasets that span multiple nodes, contiguous blocks will often be colocated on the same node. This leads to hot spots during iteration of the dataset in which single nodes (1) must send a lot of data over the network, and (2) perform lots of disk reads if the dataset is spilled to disk. This allows the workload to be spread across the nodes on which the dataset blocks are on.	2022-06-08 18:39:15 -07:00
Clark Zinzow	6987ab5966	[Datasets] [Hotfix] Fix stats construction for from_* APIs. (#25601 ) Stats construction on the from_arrow and from_numpy (and from_pandas with Pandas block support disabled) is currently broken since we weren't resolving the block metadata before passing it to the stats, causing future ds.stats() calls to fail. This PR fixes this and adds some test coverage. Drivebys: - Adds stats for from_pandas() zero-copy path (metadata fetch only). - Changes "from_numpy" stats stage name to "from_numpy_refs", to be consistent with stats for other from_*() APIs.	2022-06-08 18:04:40 -07:00
shrekris-anyscale	f3c2bd6718	[Serve] Make REST API deployments inherit top-level runtime_env (#25502 )	2022-06-08 15:58:00 -07:00
Antoni Baum	16733c2271	[AIR] Delayed type checking for Preprocessors (#25587 ) Breaks the hard dependency on Preprocessor imports for type hints in AIR. Preparation for move of Preprocessors to `ray.data`. Trainer still has a hard dependency due to an `isinstance` check.	2022-06-08 13:15:54 -07:00
Hanming Lu	d3e5bf97b5	more informative GCPNodeProvider create_node return (#25416 ) More informative return value for GCPNodeProvider create_node	2022-06-08 12:34:09 -07:00
Amog Kamsetty	3a728c4e35	[Train] Mark Trainer interfaces as Deprecated (#25573 ) Marks Trainer interfaces as Deprecated. This PR does not make any changes to the docs. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-06-08 12:30:32 -07:00
Stephanie Wang	6274bb354c	[tests] Deflake test_reconstruction.py::test_basic_reconstruction_actor_task[False] (#25456 ) This test was flaky because actor tasks can fail if submitted when the actor process is failed or restarting. This PR changes the test to be more stressful so that the error is easier to reproduce and changes the max_retries parameter to -1 so that the actor task will succeed. Related issue number Closes #24942.	2022-06-08 11:21:57 -07:00
Sihan Wang	a9e7836e8c	[Serve] Skip flaky test_autoscaling_policy on windows (#25526 )	2022-06-08 10:33:40 -07:00
Clark Zinzow	9dc0bb3d5e	[Datasets] Unrevert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031 )" (#25531 ) Unreverts #24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them. * Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" (#25057)" This reverts commit `fb2933a78f`. * Skip shuffle memory release test.	2022-06-08 10:33:25 -07:00
Amog Kamsetty	1be32e5977	[AIR] Add `_predict_arrow` interface for Predictor (#25579 ) * add interface * update docstring	2022-06-08 10:27:29 -07:00
Pamphile Roy	0bbc3379bd	Fix SciPy pinning (#25148 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2022-06-08 10:26:59 -07:00
Amog Kamsetty	80ae651f25	[Train] Clean up `ray.train` package (#25566 )	2022-06-08 10:22:36 -07:00
Kai Fricke	8affbc7be6	[tune/train] Consolidate checkpoint manager 3: Ray Tune (#24430 ) Update: This PR is now part 3 of a three PR group to consolidate the checkpoints. 1. Part 1 adds the common checkpoint management class #24771 2. Part 2 adds the integration for Ray Train #24772 3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class. Old PR description: This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core. - This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core - This PR prepares Tune to move to a CheckpointStrategy object - In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-06-08 12:05:34 +01:00
Amog Kamsetty	e0a63f770f	[Data/AIR] Move `TensorExtension` to `ray.air` for use in other packages (#25517 ) Moves Tensor extensions to ray.air to facilitate their use in other Ray libraries (AIR, Serve).	2022-06-07 14:53:22 -07:00
xwjiang2010	76b34d4a03	[air] add to_air_checkpoint method for inference only workload. (#25444 ) Follow up on our last discussion for supporting piecemeal fashion air users. Only did for tensorflow for now, want to collect some feedback on API naming, package structure etc and I will add others.	2022-06-07 14:50:39 -07:00
Sebastián Ramírez	3257994e80	♻️ Refactor types to detect invalid extra arguments (#25541 ) Currently, each function decorated with `@ray.remote` is marked with type annotations as a `RemoteFunction` class (only used for type annotations, autocompletion, inline errors, etc). The current class takes several type parameters. And then it uses those parameters in the extended `func.remote()` method. But with the current type annotations, it marks any of the unused type parameters as `None`. This means that calling the `.remote()` method would check the first (actual) arguments and the rest are marked as `None`, but that means that for type annotations it considers "correct" to pass extra `None` arguments, while actually, that would not be valid. So, this doesn't show an error, but it should: <img width="371" alt="Screenshot 2022-06-07 at 05 38 48" src="https://user-images.githubusercontent.com/1326112/172360355-9b344220-7824-4b5c-87da-038f5b53fe04.png"> ...those 2 extra `None` values should be marked as invalid. After this PR, those invalid extra arguments would be marked as invalid: <img width="588" alt="Screenshot 2022-06-07 at 05 42 10" src="https://user-images.githubusercontent.com/1326112/172360956-424b40d4-8197-4663-8298-617a1df37658.png"> And: <img width="687" alt="Screenshot 2022-06-07 at 05 42 50" src="https://user-images.githubusercontent.com/1326112/172361140-eb93c675-f5d6-4e0c-b9b2-83c4801bb450.png"> ## More context I also tried the new `TypeVarTuple`, it might simplify these type annotations in the future, but it's not currently supported by mypy yet, it's a very recent addition to the language (and `typing_extensions`) so it's probably too early to adopt it.	2022-06-07 14:34:34 -07:00
Antoni Baum	3876fcdbe8	[CI] Add bazel py_test checking for Serve (#25509 )	2022-06-07 10:54:10 -07:00
Jun Gong	9b65d5535d	[RLlib] Introduce basic connectors library. (#25311 )	2022-06-07 19:18:14 +02:00
Amog Kamsetty	4e887fe776	[Tune] Remove docstring for private _StatusReporter (#25520 ) Remove outdated docstrings for _StatusReporter. In response to https://discuss.ray.io/t/how-to-use-ray-tune-function-runner-statusreporter-with-tune-with-parameters/6400/2	2022-06-07 10:11:29 -07:00
Simon Mo	7471b1fa41	[Serve] [AIR] ModelWrapper improvements and docs (#25003 ) * batching collation code and tests * wip notebook for np and dataframe * finish content * reset ray-more-libs changes * add comments * run through * Apply suggestions from code review Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com> * rename package * lint * richard's comment Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>	2022-06-07 08:53:10 -07:00
Kai Fricke	984b9a5e6c	[tune/train] Consolidate checkpoint manager 2: Ray Train (#24772 ) This is a follow-up from #24771 which moves the Ray Train implementation to use the new common checkpoint manager class.	2022-06-07 13:51:42 +01:00
Rohan Potdar	a9d8da0100	[RLlib]: Doubly Robust Off-Policy Evaluation. (#25056 )	2022-06-07 12:52:19 +02:00
Eric Liang	c1afbcb6f4	[air] Enforce API stability annotations for AIR module (#25485 )	2022-06-06 22:52:21 -07:00
Eric Liang	78688a0903	Enable streaming ingest in AIR (#25428 ) This adds the following options to DatasetConfig, which can be used to enable streaming ingest. ``` # Whether the dataset should be streamed into memory using pipelined reads. # When enabled, get_dataset_shard() returns DatasetPipeline instead of Dataset. # The amount of memory to use is controlled by `stream_window_size`. # False by default for all datasets. use_stream_api: Optional[bool] = None # Configure the streaming window size in bytes. A typical value is something like # 20% of object store memory. If set to -1, then an infinite window size will be # used (similar to bulk ingest). This only has an effect if use_stream_api is set. # Set to 1.0 GiB by default. stream_window_size: Optional[float] = None # Whether to enable global shuffle (per pipeline window in streaming mode). Note # that this is an expensive all-to-all operation, and most likely you want to use # local shuffle instead. # False by default for all datasets. global_shuffle: Optional[bool] = None ```	2022-06-06 17:42:15 -07:00
Yi Cheng	aabe9e73ef	Revert "[Serve] Depend on uvicorn[standard] instead of uvicorn so that it pulls in uvloop (#25027 )" (#25530 ) This reverts commit `9a510f92cf`.	2022-06-06 16:41:42 -07:00
Amog Kamsetty	365fc44754	[AIR] Update to new Predictor interface (#25425 ) Updates the Predictor interface to have Pandas as a narrow waist.	2022-06-06 15:41:38 -07:00
Philipp Moritz	406c2c5778	[docs] Fix mock objects in Ray Core docs (#25498 ) Our API references are currently showing mock objects for some of our APIs -- this PR fixes them for the Ray Core API reference.	2022-06-06 15:09:01 -07:00
simonsays1980	2a5d322e70	[tune] Relative logdir paths in trials for ExperimentAnalysis in remote buckets (#25063 ) When running an experiment for example in the cloud and syncing to a bucket the logdir path in the trials will be changed when working with the checkpoints in the bucket. There are some workarounds, but the easier solution is to also add a rel_logdir containing the relative path to the trials/checkpoints that can handle any changes in the location of experiment results. As discussed with @Yard1 and @krfricke Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-06-06 22:41:41 +01:00
Florian Boucault	9a510f92cf	[Serve] Depend on uvicorn[standard] instead of uvicorn so that it pulls in uvloop (#25027 )	2022-06-06 14:23:00 -07:00
Philipp Moritz	8aff562c2f	[docs] Cleanup ray init docs (#25492 )	2022-06-06 13:16:32 -07:00
Sihan Wang	0441834021	[Serve] Fix test_standalone flacky (#25513 )	2022-06-06 13:13:32 -07:00
shrekris-anyscale	e433424796	[Serve] Checkpoint the `DeploymentState`'s `_deleting` attribute (#25478 )	2022-06-06 12:06:51 -07:00
Eric Liang	94dec83a60	[data] Rename data.impl to data._internal (#25486 )	2022-06-06 11:39:53 -07:00
shrekris-anyscale	ce3faed897	[Serve] Avoid deserializing `ReplicaConfig` properties in the Serve controller (#25213 )	2022-06-06 11:08:06 -07:00
mwtian	1ce0ab7b7c	[Core] Export additional metrics for workers and Raylet memory (#25418 ) Add visibility into the following to help Ray users and developers debug performance and OOM issues: Raylet memory usage broken down by USS vs remaining RSS. Total workers' count, CPU percentage usage, and memory usage.	2022-06-06 10:58:14 -07:00
Balaji Veeramani	c4898ed7df	[AIR] [Datasets] Add `convert_pandas_to_tf_tensor` (#25133 ) Dataset.to_tf and TensorflowPredictor attempt to convert Pandas dataframes to NumPy arrays by calling DataFrame.values. However, DataFrame.values fails if the dataframe contains multidimensional arrays. This PR solves this problem by introducing a function convert_pandas_to_tf_tensor. The implementation of the function is based on the implementation of convert_pandas_to_torch_tensor.	2022-06-06 08:29:51 -07:00
Sebastián Ramírez	298742d724	♻️ Refactor type annotations for `.remote()` to avoid incorrect autocompletion and checks (#25480 ) With the current type annotations for the `.remote()` method generated in decorated functions, editors understand that there are some keyword arguments `arg0`, `arg1`, etc. Which are incorrect as the actual function will probably have different names for its arguments. For example, this shouldn't autocomplete `arg0`, `arg1`, etc: <img width="407" alt="Screenshot 2022-06-04 at 06 13 46" src="https://user-images.githubusercontent.com/1326112/171996654-12248369-cf10-4fce-9ea2-5deb4ca8e2bd.png"> If anything, it should autocomplete `x` and `y` (although that's currently [not perfectly doable](https://github.com/python/typing/discussions/1163)). By updating the type annotations to use [arguments prefixed with double underscores](https://mypy.readthedocs.io/en/stable/protocols.html?highlight=double%20underscore#callback-protocols) at least it tells tooling to not provide autocompletion for those args (which would be incorrect). While still providing inline errors for invalid types. <img width="880" alt="Screenshot 2022-06-04 at 06 20 26" src="https://user-images.githubusercontent.com/1326112/171996806-560c0fa8-0ee3-477c-9906-71e880c84e56.png">	2022-06-05 16:21:53 -07:00
Eric Liang	48acbf0d69	[hotfix] Revert "[runtime env] runtime env inheritance refactor (#24538 )" (#25487 ) This reverts commit `eb2692c`. This is a temporary mitigation for #25484	2022-06-05 14:55:38 -07:00
Sebastián Ramírez	6e1248fb37	🚚 Move worker types to the module to improve static analysis (#25439 ) Currently, there are separated type annotations in `worker.pyi` that include the types for `func.remote()`, but they don't include types for the other things declared in `worker.py`. Because of that, editors can end up showing support only for the things in the `worker.pyi` file. For example: <img width="349" alt="Screenshot 2022-06-03 at 06 01 36" src="https://user-images.githubusercontent.com/1326112/171841977-ec7a0b9a-b4a5-4422-acd9-b73c1e263261.png"> After this change, the editor and other tools will be able to provide support for other things defined in the same file: <img width="760" alt="Screenshot 2022-06-03 at 06 04 24" src="https://user-images.githubusercontent.com/1326112/171842204-1915dd2a-6cc6-41b7-8785-5124beec37e8.png"> And the typing support for `func.remote()` keeps working as before: <img width="760" alt="Screenshot 2022-06-03 at 06 07 15" src="https://user-images.githubusercontent.com/1326112/171842528-f318753e-9f47-4236-b0a4-d86d00c0bb11.png"> This is the recommended approach by PyRight/Pylance/VS Code. I also recommend it as it's a lot easier to maintain types in the same file while editing than remembering to go to an external independent file to add those types. Also, to have proper support when using an external `.pyi` file all the things declared in `worker.py` would have to be declared in the `worker.pyi` file. Ref: https://github.com/microsoft/pyright/blob/main/docs/typed-libraries.md#inlined-type-annotations-and-type-stubs	2022-06-05 14:01:24 -07:00
matthewdeng	7dafb2e278	[air] remove invalid wandb symlink (#25488 )	2022-06-04 22:17:08 -07:00
SangBin Cho	00e3fd75f3	[State Observability] Ray log alpha API (#24964 ) This is the PR to implement ray log to the server side. The PR is continued from #24068. The PR supports two endpoints; /api/v0/logs # list logs of the node id filtered by the given glob. /api/v0/logs/{[file \| stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id Some tests need to be re-written, I will do it soon. As a follow-up after this PR, there will be 2 PRs. PR to add actual CLI PR to remove in-memory cached logs and do on-demand query for actor/worker logs	2022-06-04 05:10:23 -07:00
Yi Cheng	47c4f6f094	[flakey] Fix test_modin.py (#25469 ) test_modin.py is flakey right now. It complains about some modules can't be imported. This seems like a init issue where client mode and non-client mode are mixed. This test closes the cluster for each run. It slows the test a little bit, but it's more stable.	2022-06-04 08:34:37 +00:00
Sven Mika	b5bc2b93c3	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
SangBin Cho	54496d7705	[State Observability API] Support Filtering (#25281 ) This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs). We will also support limited types of columns for each API. The API is as follows ray list [resources] -- filter [key] [value] => filter data that's key==value. In the future, we can also support more complicated filtering like !=, And, Or , or etc.	2022-06-03 17:17:30 -07:00
Eric Liang	1f509ab331	[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337 ) This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest). This PR adds the minimal support for dataset configs. Future PRs will: - Add support for streaming ingest - Move this config from DataParallelTrainer to ml.Trainer	2022-06-03 16:32:53 -07:00
Eric Liang	22aaf47fda	[tune] Better error message for Tune nested tasks / actors (#25241 ) This PR uses a task/actor launch hook to generate better error messages for nested Tune tasks/actors in the case there are no extra resources reserved for them. The idea is that the Tune trial runner actor can set a hook prior to executing the user code. If the user code launches a task, and the placement group for the trial cannot possibly fit the task, then we raise TuneError right off to warn the user.	2022-06-03 14:53:40 -07:00
Sihan Wang	03ed27b9c1	[Serve] Fix the test_serve_start_different_http_checkpoint_options_warning flaky (#25452 )	2022-06-03 14:45:00 -07:00
Kai Fricke	4b9a89ad90	[air] Move python/ray/ml to python/ray/air (#25449 ) The package "ml" should be renamed to "air". Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility? I'd go for no to force people to use the new structure.	2022-06-03 21:53:44 +01:00
Yi Cheng	6b38b071e9	Revert "Revert "[core] Remove gcs addr updater in core worker. (#24747 )" (#25375 )" (#25391 ) This reverts commit `49efcab4fe`.	2022-06-03 12:26:27 -07:00

... 9 10 11 12 13 ...

7428 commits