hiro/ray - Forgejo: Beyond coding. We Forge.

13904 commits 227 branches 67 tags 234 MiB

Author	SHA1	Message	Date
matthewdeng	3ea80f6aa1	[data] set iter_batches default batch_size (#26955 ) Why are these changes needed? Resubmitting #26869. This PR was reverted due to failing tests; however, those failures were actually due to a dependency: #26950	2022-07-25 08:34:25 -07:00
matthewdeng	bcec60d898	Revert "[data] set iter_batches default batch_size #26869 " (#26938 ) This reverts commit `b048c6f659`.	2022-07-23 17:46:45 -07:00
matthewdeng	b048c6f659	[data] set iter_batches default batch_size #26869 Why are these changes needed? Consumers (e.g. Train) may expect generated batches to be of the same size. Prior to this change, the default behavior would be for each batch to be one block, which may be of different sizes. Changes Set default batch_size to 256. This was chosen to be a sensible default for training workloads, which is intentionally different from the existing default batch_size value for Dataset.map_batches. Update docs for Dataset.iter_batches, Dataset.map_batches, and DatasetPipeline.iter_batches to be consistent. Updated tests and examples to explicitly pass in batch_size=None as these tests were intentionally testing block iteration, and there are other tests that test explicit batch sizes.	2022-07-23 13:44:53 -07:00
Chen Shen	b20f5f51df	[Air][Data] Don't promote locality_hints for split (#26647 ) Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See #26641 for more context	2022-07-17 22:18:30 -07:00
Jian Xiao	9fe4dba4ad	Revamp the Getting Started page for Dataset (#24860 ) This is part of the Dataset GA doc fix effort to update/improve the documentation. This PR revamps the Getting Started page. What are the changes: - Focus on basic/core features that are bread-and-butter for users, leave the advanced features out - Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API) - Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case) - Use the same data example throughout to make it context-switch free - Use runnable code rather than faked - Reference to the code from doc, instead of inlining them in the doc Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-05-18 13:46:23 -07:00
Max Pumperla	29d94a2211	[docs] sphinx gallery removal, migrate to ipynb (#22467 )	2022-02-19 01:19:07 -08:00

Renamed from doc/source/data/_examples/doc_code/quick_start.py (Browse further)

6 commits