ray/doc/source/data
matthewdeng b048c6f659
[data] set iter_batches default batch_size #26869
Why are these changes needed?
Consumers (e.g. Train) may expect generated batches to be of the same size. Prior to this change, the default behavior would be for each batch to be one block, which may be of different sizes.

Changes
Set default batch_size to 256. This was chosen to be a sensible default for training workloads, which is intentionally different from the existing default batch_size value for Dataset.map_batches.
Update docs for Dataset.iter_batches, Dataset.map_batches, and DatasetPipeline.iter_batches to be consistent.
Updated tests and examples to explicitly pass in batch_size=None as these tests were intentionally testing block iteration, and there are other tests that test explicit batch sizes.
2022-07-23 13:44:53 -07:00
..
doc_code [data] set iter_batches default batch_size #26869 2022-07-23 13:44:53 -07:00
examples [core] ray.init defaults to an existing Ray instance if there is one (#26678) 2022-07-23 11:27:22 -07:00
images [docs] Cleanup the Datasets key concept docs (#26908) 2022-07-22 23:30:54 -07:00
modin Fix broken links in documentation and put linkcheck linter in place on CI (#23340) 2022-03-18 21:02:52 -07:00
accessing-datasets.rst [Datasets] Overhaul "Accessing Datasets" feature guide. (#24963) 2022-05-19 12:50:00 -07:00
advanced-pipelines.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
big_data_ingestion.yaml Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763) 2022-01-20 15:30:56 -08:00
creating-datasets.rst [Datasets] Autodetect dataset parallelism based on available resources and data size (#25883) 2022-07-12 21:08:49 -07:00
custom-data.rst [Datasets] Overhaul of "Creating Datasets" feature guide. (#24831) 2022-05-17 16:23:42 -07:00
dask-on-ray.rst Update dask version for Ray 1.12.0 (#23197) 2022-03-15 19:22:19 -07:00
dataset-ml-preprocessing.rst [Datasets] Update docs for drop_columns and fix typos (#26317) 2022-07-07 17:17:33 -07:00
dataset-tensor-support.rst [Datasets] Unrevert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031)" (#25531) 2022-06-08 10:33:25 -07:00
dataset.rst [docs] Cleanup the Datasets key concept docs (#26908) 2022-07-22 23:30:54 -07:00
faq.rst docs: Fix a few typos (#26556) 2022-07-14 12:38:33 -07:00
getting-started.rst [Datasets] [Tensor Story - 2/2] Add "numpy" batch format for batch mapping and batch consumption. (#24870) 2022-06-17 16:01:02 -07:00
integrations.rst Revamp the Getting Started page for Dataset (#24860) 2022-05-18 13:46:23 -07:00
key-concepts.rst [docs] Cleanup the Datasets key concept docs (#26908) 2022-07-22 23:30:54 -07:00
mars-on-ray.rst [Datasets] Integrate Mars-on-Ray with Datasets; improve docs and add tests (#23402) 2022-04-29 09:43:52 -07:00
memory-management.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
package-ref.rst [Datasets] Add ImageFolderDatasource (#24641) 2022-07-15 22:43:23 -07:00
performance-tips.rst [docs] Cleanup the Datasets key concept docs (#26908) 2022-07-22 23:30:54 -07:00
pipelining-compute.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
random-access.rst [Datasets] Overhaul "Accessing Datasets" feature guide. (#24963) 2022-05-19 12:50:00 -07:00
raydp.rst [Docs] Ray Data docs target state (#21931) 2022-01-27 13:14:36 -08:00
saving-datasets.rst Revamp the Transforming Datasets user guide (#25033) 2022-05-20 19:25:06 -07:00
transforming-datasets.rst [Datasets] [Tensor Story - 2/2] Add "numpy" batch format for batch mapping and batch consumption. (#24870) 2022-06-17 16:01:02 -07:00
user-guide.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00