hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
Jiajun Yao	95714cc281	Node affinity scheduling strategy (#23381 ) Instead of relying on the node-ip custom resource for static task-to-node placement, this PR introduces an explicit NodeAffinitySchedulingStrategy with the following benefits: 1. Specify node using id instead of ip since ip may not be unique for each node. 2. Support soft constraint so the task can be tolerant to node failures. After this PR, the node-ip custom resource can be deprecated.	2022-04-12 21:31:26 -07:00
Clark Zinzow	12f0dc1faf	Revert "[serialization] Enable debugging into pickle backend (#23854 )" (#23877 ) This reverts commit `ef7180365d`.	2022-04-12 16:53:20 -07:00
Eric Liang	191c83305b	[minor] Fix minor spelling issue on actor task execution	2022-04-12 16:18:25 -07:00
Siyuan (Ryans) Zhuang	ef7180365d	[serialization] Enable debugging into pickle backend (#23854 ) * enable debugging cloudpickle	2022-04-12 13:48:35 -07:00
Amog Kamsetty	5a41fb18bd	[Docs] Automatically render latest `ray_lightning` docs (#23729 ) Automatically pull the latest ray_lightning README to render on Ray docs. (#23505) Depends on ray-project/ray_lightning#135	2022-04-08 16:57:23 -07:00
Matti Picus	77c4c1e48e	WINDOWS: enable and fix failures in test_runtime_env_complicated (#22449 )	2022-03-29 00:56:42 -07:00
Eric Liang	38925f60d2	Add a `get_if_exists` option for simpler creation of named actors (#23344 ) Getting or creating a named actor is a common pattern, however it is somewhat esoteric in how to achieve this. Add a utility function and test that it doesn't cause any scary error messages. Actor.options(name="my_singleton", get_if_exists=True).remote(args)	2022-03-23 22:02:58 -07:00
Chen Shen	48d456d373	[RFC][Doc] add a page describe actor execution order. (#23406 ) * add * task-orders * fix * address comments * add * address comments	2022-03-23 11:07:18 -07:00
Jiajun Yao	d3159f201b	[Doc] Add scheduling doc (#23343 )	2022-03-20 16:05:06 -07:00
Philipp Moritz	886cc4d674	Fix broken links in documentation and put linkcheck linter in place on CI (#23340 )	2022-03-18 21:02:52 -07:00
Jialing He	4a83bc3dc2	[runtime env] Support set timeout for runtime env setup (#23082 ) Interface example: ```python @ray.remote(runtime_env=RuntimeEnv(..., config=RuntimeEnvConfig(setup_timeout_s=10)) def f(): pass @ray.remote(runtime_env={..., "config": {"setup_timeout_s": 10}}) def f(): pass ``` Support set timeout second for timeout of runtime environment creation. Co-authored-by: 捕牛 <hejialing.hjl@antgroup.com>	2022-03-18 12:52:59 -05:00
Archit Kulkarni	76bb5396c7	[Doc] [jobs] Add links to Job Submission and improve doc (#23209 ) - Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested. - Adds docstrings for the Jobs SDK, which automatically show up in the API reference - Improve the Job Submission main page - Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-18 12:52:13 -05:00
Archit Kulkarni	16fd099b8b	[runtime env] Change `pip_check` default from `True` to `False` (#23306 ) @SongGuyang @Catch-Bull @edoakes I know we discussed this earlier, but after thinking about it some more I think a more reasonable default is for `pip check` to be `False` by default. My guess is that a lot of users (including myself) work inside an environment where `python -m pip check` fails, but the environment doesn't cause them any problems otherwise. So a lot of users will hit an error when trying a simple `runtime_env` `pip` example, and possibly give up. Another less important piece of evidence is that we had to set `pip_check = False` to make some CI tests pass in the original PR. This also matches the default behavior of pip which allows this situation to occur in the first place: `pip install` doesn't error when there's a dependency conflict; rather the command succeeds, the package is installed and usable, and it prints a warning (which is confusingly titled "ERROR")	2022-03-18 12:51:41 -05:00
shrekris-anyscale	86169d2452	[docs] Fix malformatted list in "Advanced Pattern: Fault Tolerance with Actor Checkpointing" (#23319 )	2022-03-18 10:50:13 -07:00
Guyang Song	1ad019aac3	[C++ API][Doc] Add doc and error log to notice C++ API is not supported on Windows (#23272 ) We don't support Windows entirely now. ## Checks - [X] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(	2022-03-18 10:52:57 +08:00
Archit Kulkarni	684a1821d3	[Doc] [runtime_env] Add limitation about single-file `py_modules` to doc (#23248 ) Until #23151 is fixed, this PR adds it as a known limitation in the documentation.	2022-03-17 16:23:46 -05:00
Eric Liang	c8f207f746	[docs] Core docs refactor (#23216 ) This PR makes a number of major overhauls to the Ray core docs: Add a key-concepts section for {Tasks, Actors, Objects, Placement Groups, Env Deps}. Re-org the user guide to align with key concepts. Rewrite the walkthrough to link to mini-walkthroughs in the key concept sections. Minor tweaks and additional transition material.	2022-03-17 11:26:17 -07:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
Jialing He	39a6c054d3	[runtime env][feature] introduce pip_check_enable and pip_version (#22826 )	2022-03-14 23:41:19 +08:00
Kenneth	07372927cc	Enable buffering and spilling to multiple remote storages (#22798 ) Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally. In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes. Buffer Size \| Runtime (s) -- \| -- Default \| 3221.865916 256KB \| 1758.885839 1MB \| 748.226089 10MB \| 526.406466 100MB \| 494.830513 Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size. For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes. Prefixes \| Runtime (s) -- \| -- 1 \| 748.226089 3 \| 527.658646 10 \| 516.010742 Together these changes enable faster large-scale object spilling. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-11 11:27:02 -05:00
matthewdeng	3a3a7b4be4	[test] add back deleted datasets train test file (#23051 )	2022-03-10 21:46:07 -08:00
Max Pumperla	2b8faae40c	[docs] re/move old core examples (#22802 )	2022-03-10 12:17:00 -08:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
qicosmos	e4a9517739	[C++ Worker]Python call cpp worker (#22820 )	2022-03-10 11:06:14 -08:00
Eric Liang	52491c87e2	Make a pass fixing Dataset API issues (#22886 )	2022-03-08 13:07:55 -08:00
Stephanie Wang	cb218d03b9	[core] Enable lineage reconstruction by default (#22816 ) Enables lineage reconstruction, which allows automatic recovery of task outputs, by default. Also adds an info message to the driver whenever objects need to be reconstructed (not including recursive reconstruction).	2022-03-07 17:40:30 -05:00
Archit Kulkarni	e937f1a3c4	[runtime env] [Doc] add more details about runtime env logs (#22480 ) Clarifies the logging behavior for runtime envs, and adds the runtime env logs fileto the list of log files in the main logging page.	2022-03-02 14:27:28 -08:00
Kenneth	9b67cb5a6f	Add buffering to object spilling (#22618 ) This change is needed for object fusing to see performance increases on HDD. Currently, smaller object writes are slow even with fusing since the writes are not buffered (negating the point of fusing). Benchmarks show that while the default is sufficient for fast SSDs, on a slow HDD, increasing the buffer size reduces write times by several magnitudes. ### Performance Changes A microbenchmark where 500KB objects were produced (then spilled) and consumed to observe changes in object fusing/spilling. \| Run \| Produce (s) \| Consume (s) \| Total (s) \| \| -- \| -- \| -- \| -- \| \| Baseline (original) \| 347.332281 \| 355.611272 \| 705.560750 \| \| Baseline (w/ fix) \| 181.815852 \| 347.692850 \| 532.847759 \| \| No fusing (original) \| 453.574554 \| 525.047998 \| 981.620108 \| \| No fusing (w/ fix) \| 452.614848\| 519.787698 \| 975.412639 \| The baseline runs should be notably faster due to object fusing reducing I/O requests. With the fix, Ray's defaults allow this microbenchmark to have a 48% time reduction with negligible impact on runtime when fusing is disabled. See [this followup](https://github.com/ray-project/ray/pull/22618#issuecomment-1054838715) for information on the differences between SSD and HDD performance with different buffer sizes. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-01 10:13:10 -08:00
Stephanie Wang	73f078236f	[doc] Update docs about actor garbage collection (#20763 ) Update outdated actor docs about when actors are GCed.	2022-02-28 18:45:29 -08:00
Jialing He	98a69cbd90	[runtime env][strong-typed API] Combine `ParsedRuntimeEnv` and `RuntimeEnv` into `ray.runtime.RuntimeEnv` (#22522 ) Combine `ParsedRuntimeEnv` and `RuntimeEnv` into `ray.runtime.RuntimeEnv`, details: #21495 - The `new RuntimeEnv` includes all external interfaces of `ParsedRuntimeEnv` and `old RuntimeEnv`. - The `new RuntimeEnv` will be exposed directly to the user. - example: ```python runtime_env = ray.runtime_env.RuntimeEnv(working_dir="s3://workding_dir.zip", pip=["requests"], java_jars=["s3://jar1.zip"], java_jvm_options=["-Dxxx=xxx"]) ```	2022-02-28 16:18:10 +08:00
Max Pumperla	372c620f58	[docs] Tune overhaul part II (#22656 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-02-26 23:07:34 -08:00
xwjiang2010	d4a1bc7bc7	Revert "[runtime env] runtime env inheritance refactor (#22244 )" (#22626 ) Breaks train_torch_linear_test.py.	2022-02-25 08:42:30 -06:00
jon-chuang	11500dc12c	[docs] include ray status and ray monitor into ray command line api docs (#22614 ) Fixes: https://github.com/ray-project/ray/issues/18527	2022-02-23 20:09:45 -08:00
Edward Oakes	5a21289a34	[runtime_env] Remove get_current_runtime_env from docs (#22594 ) We should just encourage people to use the existing `get_runtime_context` API instead of introducing a new one here. Just removing the docs for now while we discuss this.	2022-02-23 16:53:52 -06:00
mwtian	9a157dfe82	[GCS-Ray] update doc and error message for GCS-Ray (#22528 ) Update documentation to reflect that Ray no longer starts Redis by default.	2022-02-22 17:56:30 -08:00
Guyang Song	5783cdb254	[runtime env] runtime env inheritance refactor (#22244 ) Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - If there is no runtime env option when we create actor, inherit the parent runtime env. - Otherwise, use the optional runtime env directly and don't do the merging. Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.	2022-02-21 18:13:22 +08:00
Max Pumperla	29d94a2211	[docs] sphinx gallery removal, migrate to ipynb (#22467 )	2022-02-19 01:19:07 -08:00
Archit Kulkarni	8c12e30f11	[Doc] Add actor max restarts default value to fault tolerance doc (#22481 )	2022-02-18 17:48:22 -06:00
Ian Rodney	c9a4b17f99	[YAMLs] Fix comments about autoscaler round-robining (#22002 )	2022-02-17 13:59:05 -08:00
Qing Wang	7c45d1a366	[doc][Java] Add doc page for java concurrency group. (#21600 ) Add document page for Java concurrency group. Co-authored-by: Kai Yang <kfstorm@outlook.com>	2022-02-16 17:57:03 +08:00
Archit Kulkarni	0e350c0074	[runtime env] [Doc] Add two ways of installing dependencies: cluster launcher, and runtime env (#20780 ) We shouldn't promote Runtime Environments as the only way to do things until all Core nightly and release tests are run using runtime environments. This PR adds the prior approach (using cluster launcher commands) to the doc on equal footing, describing the differences between the two. Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2022-02-14 16:03:48 -06:00
Balaji Veeramani	abad268549	Comment `fmt: off` annotations (#21984 ) Code formatting is disabled in several modules with the explanation > [The module] ignores yapf because yapf doesn't allow comments right after code blocks, but we put comments right after code blocks to prevent large white spaces in the documentation. Since we no longer use YAPF, it may be possible to re-enable code formatting on these modules. I've added "FIXME" comments requesting developers to check whether code formatter appeasements are still necessary.	2022-02-09 22:12:11 -08:00
SangBin Cho	e5cab878b8	[Core] Disable runtime env logs (#22198 ) Disable runtime env logs streamed to the driver by default and improve the documentation.	2022-02-09 14:43:25 -08:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Guyang Song	36ba514f9c	[Doc] Fix bad doc and recover doc of c++ api (#22213 )	2022-02-08 19:04:37 +08:00
Max Pumperla	5cc9355303	[Docs ] Tune docs overhaul (first part) (#22112 ) Continuing docs overhaul, tune now has: - [x] better landing page - [x] a getting started guide - [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials - [x] the new user guide contains guides to tune features and practical integrations - [x] we rewrote some of the feature guides for clarity - [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower) - [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030. - [x] Examples are tested in the new framework, of course. There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-02-07 15:47:03 +00:00
Archit Kulkarni	d7be4e1d3c	[doc] [runtime env] Add note that referencing local files in requirements.txt is not supported (#22095 )	2022-02-04 15:32:19 -06:00
Archit Kulkarni	78f882dbbc	[runtime env] Local uri caching for working_dir, py_modules and conda (#20273 ) Previously, local files corresponding to runtime env URIs were eagerly garbage collected as soon as there were no more references to them. In this PR, we store this data in a cache instead, so when the reference count for a URI drops to zero, instead of deleting it we simple mark it as unused in the cache. When the cache exceeds its size limit (default 10 GB) it will delete unused URIs until the cache is back under the size limit or there are no more unused URIs. Design doc: https://docs.google.com/document/d/1x1JAHg7c0ewcOYwhhclbuW0B0UC7l92WFkF4Su0T-dk/edit - Adds unit tests for caching and integration tests for working_dir caching	2022-02-02 14:53:03 -06:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00

1 2

57 commits