hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	f7ae8923f6	[docs] Reorganize the tensor data support docs; general editing (#26952 ) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing	2022-08-01 17:31:41 -07:00
Eric Liang	1ac2a872e7	[docs] Editing pass over Dataset docs (#26935 )	2022-07-24 19:48:29 -07:00
Eric Liang	63a6c1dfac	[docs] Cleanup the Datasets key concept docs (#26908 ) Clean up the Datasets key concept doc to be suitable for consumption by a beginner level user and improving the diagrams.	2022-07-22 23:30:54 -07:00
Eric Liang	12825fc5aa	[air] Add a warning if no CPUs are reserved for dataset execution (#26643 )	2022-07-17 16:33:51 -07:00
Eric Liang	400330e9c0	[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634 )	2022-07-16 21:55:51 -07:00
Eric Liang	9de1add073	[Datasets] Autodetect dataset parallelism based on available resources and data size (#25883 ) This PR defaults the parallelism of Dataset reads to `-1`. The parallelism is determined according to the following rule in this case: - The number of available CPUs is estimated. If in a placement group, the number of CPUs in the cluster is scaled by the size of the placement group compared to the cluster size. If not in a placement group, this is the number of CPUs in the cluster. If the estimated CPUs is less than 8, it is set to 8. - The parallelism is set to the estimated number of CPUs multiplied by 2. - The in-memory data size is estimated. If the parallelism would create in-memory blocks larger than the target block size (512MiB), the parallelism is increased until the blocks are < 512MiB in size. These rules fix two common user problems: 1. Insufficient parallelism in a large cluster, or too much parallelism on a small cluster. 2. Overly large block sizes leading to OOMs when processing a single block. TODO: - [x] Unit tests - [x] Docs update Supercedes part of: https://github.com/ray-project/ray/pull/25708 Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>	2022-07-12 21:08:49 -07:00
Myeongju Kim	a1a78077ca	Fix a broken link in Ray Dataset doc (#25927 ) Co-authored-by: Myeong Kim <myeongki@amazon.com>	2022-06-20 13:17:46 -07:00
Clark Zinzow	1701b923bc	[Datasets] [Tensor Story - 2/2] Add `"numpy"` batch format for batch mapping and batch consumption. (#24870 ) This PR adds a NumPy "numpy" batch format for batch transformations and batch consumption that works with all block types. See #24811.	2022-06-17 16:01:02 -07:00
Stephanie Wang	473a962d89	[Datasets] [Docs] Add docs about fault tolerance in Datasets (#25371 ) Adds description of fault tolerance guarantees for Datasets. Related issue number Closes #24856.	2022-06-02 15:53:50 -07:00
Kai Fricke	6fe91885b0	[docs/lint] Fix reference to `dataset_tune` (#25402 )	2022-06-02 11:40:26 +01:00
Eric Liang	51b295ad74	[docs] Improve Tune + Datasets documentation (#25389 )	2022-06-01 21:52:32 -07:00
Eric Liang	71717e59c4	[data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262 )	2022-06-01 13:50:46 -07:00
Clark Zinzow	2c8fac369a	Note that explicit resource allocation is experimental, fix typos (#25038 )	2022-05-20 11:36:08 -07:00
Clark Zinzow	0b6505e8c6	[Datasets] Miscellaneous GA docs P0s. (#24891 ) This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely: - Documents Datasets resource allocation model. - De-emphasizes global/windowed shuffling. - Documents lazy execution mode, and expands our execution model docs in general.	2022-05-18 16:17:48 -07:00
Jian Xiao	6d93e9f0f5	Cleanup the DatasetPipeline references in Getting Started; rename Exchanging to Accessing (#23786 )	2022-04-12 17:10:14 -07:00
Eric Liang	5a0b7a7ee0	Document Dataset pipeline stage fusion (#22737 )	2022-03-01 14:38:09 -08:00
Clark Zinzow	fb0d6e6b0b	[Datasets] [Docs] Datasets library branding + positioning tweaks (#22067 )	2022-02-05 16:59:34 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00

18 commits