hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Max Pumperla	3ffcb81bd3	[docs] remove non-functional lbfgs example (#24727 ) This example simply doesn't run as is. We can bring it back up again later, if it makes sense. But it's not clear what the variables used there, like actor are. Fixes #21328 Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-05-18 10:53:14 +01:00
Simon Mo	c3ac6fcf3f	Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894 )	2022-05-17 19:31:05 -07:00
Eric Liang	437df9431c	[docs] Remove bad suggestions to use local_mode or num_cpus in init (#24827 )	2022-05-17 12:55:04 -07:00
Ofey Chan	c6c72a6f89	[Doc] [Core] Enhance actor queue doc code (#24532 ) Why are these changes needed? Current documentation code in Message passing using Ray Queue can be enhanced, for better demonstration of the message queue. It creates 10 tasks but only 2 consumers, and each consumer consumes one task then exit. Therefore, the output is a bit vague: (consumer pid=1022727) got work 0 (consumer pid=1022595) got work 1 So I make consumer working until the queue is empty. The output shows consumer 1 and 2 working in parallel: (consumer pid=1030876) consumer 0 got work 0 (consumer pid=1030876) consumer 0 got work 1 (consumer pid=1030876) consumer 0 got work 3 (consumer pid=1030876) consumer 0 got work 5 (consumer pid=1030876) consumer 0 got work 7 (consumer pid=1030876) consumer 0 got work 9 (consumer pid=1030949) consumer 1 got work 2 (consumer pid=1030949) consumer 1 got work 4 (consumer pid=1030949) consumer 1 got work 6 (consumer pid=1030949) consumer 1 got work 8 P.S. Also fix a typo in doc.	2022-05-15 17:38:21 -07:00
Archit Kulkarni	738da639d9	[runtime env] Add FAQ for runtime_env (#24412 ) Adds some frequently asked user questions to the docs. Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>	2022-05-13 11:03:58 -05:00
Jiajun Yao	0a0c52e351	[Doc] Improve doc for task locality aware scheduling (#24717 )	2022-05-12 13:42:48 -07:00
Guilherme	bb0bcbace0	[docs] Fix example in ray-get-loop.rst (#24609 )	2022-05-12 00:05:57 -07:00
Eric Liang	2b598ca440	[doc] Improve the object reference documentation (#24636 )	2022-05-10 18:39:16 -07:00
Jiajun Yao	1daad65568	[Doc] Add doc for usage stats collection (#24522 )	2022-05-10 17:18:49 -07:00
Chen Shen	00a0f81090	[Doc][xgboost] fix broken download links #24632 The yaml file download link is missing, fixed it in this pr.	2022-05-10 09:03:26 -07:00
Jiajun Yao	d462172be7	Add doc for actor spread scheduling (#24552 ) grant_or_reject for raylet based actor scheduling is implemented as part of #23829, so spread scheduling now works for actors just like tasks.	2022-05-06 21:36:47 -07:00
Jiajun Yao	b8e61bc4d8	Add actor out-of-band communication doc (#24185 ) Add typical use cases for actor out-of-band communication.	2022-05-03 21:36:26 -07:00
Eric Liang	d178645f18	[docs] Add documentation on how to handle read-only arrays and actor reprs (#24410 )	2022-05-02 23:52:54 -07:00
fede	9a6e0538ea	Pythonic assert for initialization (#24378 )	2022-05-01 22:01:10 -07:00
Clark Zinzow	395a1c9aa2	[Doc] Fix actor fault tolerance link. (#23972 )	2022-04-18 11:49:53 -07:00
Siyuan (Ryans) Zhuang	85542c9911	Revert "Revert "[serialization] Enable debugging into pickle backend (#23854 )"(#23877 )" (#23878 ) * Revert "Revert "[serialization] Enable debugging into pickle backend (#23854)" (#23877)" This reverts commit `12f0dc1faf`. * fix	2022-04-14 11:07:54 -07:00
Jiajun Yao	95714cc281	Node affinity scheduling strategy (#23381 ) Instead of relying on the node-ip custom resource for static task-to-node placement, this PR introduces an explicit NodeAffinitySchedulingStrategy with the following benefits: 1. Specify node using id instead of ip since ip may not be unique for each node. 2. Support soft constraint so the task can be tolerant to node failures. After this PR, the node-ip custom resource can be deprecated.	2022-04-12 21:31:26 -07:00
Clark Zinzow	12f0dc1faf	Revert "[serialization] Enable debugging into pickle backend (#23854 )" (#23877 ) This reverts commit `ef7180365d`.	2022-04-12 16:53:20 -07:00
Eric Liang	191c83305b	[minor] Fix minor spelling issue on actor task execution	2022-04-12 16:18:25 -07:00
Siyuan (Ryans) Zhuang	ef7180365d	[serialization] Enable debugging into pickle backend (#23854 ) * enable debugging cloudpickle	2022-04-12 13:48:35 -07:00
Amog Kamsetty	5a41fb18bd	[Docs] Automatically render latest `ray_lightning` docs (#23729 ) Automatically pull the latest ray_lightning README to render on Ray docs. (#23505) Depends on ray-project/ray_lightning#135	2022-04-08 16:57:23 -07:00
Matti Picus	77c4c1e48e	WINDOWS: enable and fix failures in test_runtime_env_complicated (#22449 )	2022-03-29 00:56:42 -07:00
Eric Liang	38925f60d2	Add a `get_if_exists` option for simpler creation of named actors (#23344 ) Getting or creating a named actor is a common pattern, however it is somewhat esoteric in how to achieve this. Add a utility function and test that it doesn't cause any scary error messages. Actor.options(name="my_singleton", get_if_exists=True).remote(args)	2022-03-23 22:02:58 -07:00
Chen Shen	48d456d373	[RFC][Doc] add a page describe actor execution order. (#23406 ) * add * task-orders * fix * address comments * add * address comments	2022-03-23 11:07:18 -07:00
Jiajun Yao	d3159f201b	[Doc] Add scheduling doc (#23343 )	2022-03-20 16:05:06 -07:00
Philipp Moritz	886cc4d674	Fix broken links in documentation and put linkcheck linter in place on CI (#23340 )	2022-03-18 21:02:52 -07:00
Jialing He	4a83bc3dc2	[runtime env] Support set timeout for runtime env setup (#23082 ) Interface example: ```python @ray.remote(runtime_env=RuntimeEnv(..., config=RuntimeEnvConfig(setup_timeout_s=10)) def f(): pass @ray.remote(runtime_env={..., "config": {"setup_timeout_s": 10}}) def f(): pass ``` Support set timeout second for timeout of runtime environment creation. Co-authored-by: 捕牛 <hejialing.hjl@antgroup.com>	2022-03-18 12:52:59 -05:00
Archit Kulkarni	76bb5396c7	[Doc] [jobs] Add links to Job Submission and improve doc (#23209 ) - Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested. - Adds docstrings for the Jobs SDK, which automatically show up in the API reference - Improve the Job Submission main page - Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-18 12:52:13 -05:00
Archit Kulkarni	16fd099b8b	[runtime env] Change `pip_check` default from `True` to `False` (#23306 ) @SongGuyang @Catch-Bull @edoakes I know we discussed this earlier, but after thinking about it some more I think a more reasonable default is for `pip check` to be `False` by default. My guess is that a lot of users (including myself) work inside an environment where `python -m pip check` fails, but the environment doesn't cause them any problems otherwise. So a lot of users will hit an error when trying a simple `runtime_env` `pip` example, and possibly give up. Another less important piece of evidence is that we had to set `pip_check = False` to make some CI tests pass in the original PR. This also matches the default behavior of pip which allows this situation to occur in the first place: `pip install` doesn't error when there's a dependency conflict; rather the command succeeds, the package is installed and usable, and it prints a warning (which is confusingly titled "ERROR")	2022-03-18 12:51:41 -05:00
shrekris-anyscale	86169d2452	[docs] Fix malformatted list in "Advanced Pattern: Fault Tolerance with Actor Checkpointing" (#23319 )	2022-03-18 10:50:13 -07:00
Guyang Song	1ad019aac3	[C++ API][Doc] Add doc and error log to notice C++ API is not supported on Windows (#23272 ) We don't support Windows entirely now. ## Checks - [X] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(	2022-03-18 10:52:57 +08:00
Archit Kulkarni	684a1821d3	[Doc] [runtime_env] Add limitation about single-file `py_modules` to doc (#23248 ) Until #23151 is fixed, this PR adds it as a known limitation in the documentation.	2022-03-17 16:23:46 -05:00
Eric Liang	c8f207f746	[docs] Core docs refactor (#23216 ) This PR makes a number of major overhauls to the Ray core docs: Add a key-concepts section for {Tasks, Actors, Objects, Placement Groups, Env Deps}. Re-org the user guide to align with key concepts. Rewrite the walkthrough to link to mini-walkthroughs in the key concept sections. Minor tweaks and additional transition material.	2022-03-17 11:26:17 -07:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
Jialing He	39a6c054d3	[runtime env][feature] introduce pip_check_enable and pip_version (#22826 )	2022-03-14 23:41:19 +08:00
Kenneth	07372927cc	Enable buffering and spilling to multiple remote storages (#22798 ) Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally. In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes. Buffer Size \| Runtime (s) -- \| -- Default \| 3221.865916 256KB \| 1758.885839 1MB \| 748.226089 10MB \| 526.406466 100MB \| 494.830513 Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size. For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes. Prefixes \| Runtime (s) -- \| -- 1 \| 748.226089 3 \| 527.658646 10 \| 516.010742 Together these changes enable faster large-scale object spilling. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-11 11:27:02 -05:00
matthewdeng	3a3a7b4be4	[test] add back deleted datasets train test file (#23051 )	2022-03-10 21:46:07 -08:00
Max Pumperla	2b8faae40c	[docs] re/move old core examples (#22802 )	2022-03-10 12:17:00 -08:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
qicosmos	e4a9517739	[C++ Worker]Python call cpp worker (#22820 )	2022-03-10 11:06:14 -08:00
Eric Liang	52491c87e2	Make a pass fixing Dataset API issues (#22886 )	2022-03-08 13:07:55 -08:00
Stephanie Wang	cb218d03b9	[core] Enable lineage reconstruction by default (#22816 ) Enables lineage reconstruction, which allows automatic recovery of task outputs, by default. Also adds an info message to the driver whenever objects need to be reconstructed (not including recursive reconstruction).	2022-03-07 17:40:30 -05:00
Archit Kulkarni	e937f1a3c4	[runtime env] [Doc] add more details about runtime env logs (#22480 ) Clarifies the logging behavior for runtime envs, and adds the runtime env logs fileto the list of log files in the main logging page.	2022-03-02 14:27:28 -08:00
Kenneth	9b67cb5a6f	Add buffering to object spilling (#22618 ) This change is needed for object fusing to see performance increases on HDD. Currently, smaller object writes are slow even with fusing since the writes are not buffered (negating the point of fusing). Benchmarks show that while the default is sufficient for fast SSDs, on a slow HDD, increasing the buffer size reduces write times by several magnitudes. ### Performance Changes A microbenchmark where 500KB objects were produced (then spilled) and consumed to observe changes in object fusing/spilling. \| Run \| Produce (s) \| Consume (s) \| Total (s) \| \| -- \| -- \| -- \| -- \| \| Baseline (original) \| 347.332281 \| 355.611272 \| 705.560750 \| \| Baseline (w/ fix) \| 181.815852 \| 347.692850 \| 532.847759 \| \| No fusing (original) \| 453.574554 \| 525.047998 \| 981.620108 \| \| No fusing (w/ fix) \| 452.614848\| 519.787698 \| 975.412639 \| The baseline runs should be notably faster due to object fusing reducing I/O requests. With the fix, Ray's defaults allow this microbenchmark to have a 48% time reduction with negligible impact on runtime when fusing is disabled. See [this followup](https://github.com/ray-project/ray/pull/22618#issuecomment-1054838715) for information on the differences between SSD and HDD performance with different buffer sizes. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-01 10:13:10 -08:00
Stephanie Wang	73f078236f	[doc] Update docs about actor garbage collection (#20763 ) Update outdated actor docs about when actors are GCed.	2022-02-28 18:45:29 -08:00
Jialing He	98a69cbd90	[runtime env][strong-typed API] Combine `ParsedRuntimeEnv` and `RuntimeEnv` into `ray.runtime.RuntimeEnv` (#22522 ) Combine `ParsedRuntimeEnv` and `RuntimeEnv` into `ray.runtime.RuntimeEnv`, details: #21495 - The `new RuntimeEnv` includes all external interfaces of `ParsedRuntimeEnv` and `old RuntimeEnv`. - The `new RuntimeEnv` will be exposed directly to the user. - example: ```python runtime_env = ray.runtime_env.RuntimeEnv(working_dir="s3://workding_dir.zip", pip=["requests"], java_jars=["s3://jar1.zip"], java_jvm_options=["-Dxxx=xxx"]) ```	2022-02-28 16:18:10 +08:00
Max Pumperla	372c620f58	[docs] Tune overhaul part II (#22656 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-02-26 23:07:34 -08:00
xwjiang2010	d4a1bc7bc7	Revert "[runtime env] runtime env inheritance refactor (#22244 )" (#22626 ) Breaks train_torch_linear_test.py.	2022-02-25 08:42:30 -06:00
jon-chuang	11500dc12c	[docs] include ray status and ray monitor into ray command line api docs (#22614 ) Fixes: https://github.com/ray-project/ray/issues/18527	2022-02-23 20:09:45 -08:00
Edward Oakes	5a21289a34	[runtime_env] Remove get_current_runtime_env from docs (#22594 ) We should just encourage people to use the existing `get_runtime_context` API instead of introducing a new one here. Just removing the docs for now while we discuss this.	2022-02-23 16:53:52 -06:00

1 2

73 commits