hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
Cheng Su	bc5d8d9176	[AIR] Replace references of `to_tf` with `iter_tf_batches` (#27672 )	2022-08-09 16:00:02 -07:00
Clark Zinzow	3b151c581e	[Datasets] Delay expensive tensor extension type import until Parquet reading. (#27653 ) The tensor extension import is a bit expensive since it will go through Arrow's and Pandas' extension type registration logic. This PR delays the tensor extension type import until Parquet reading, which is the only case in which we need to explicitly register the type. I have confirmed that the Parquet reading in doc/source/data/doc_code/tensor.py passes with this change.	2022-08-08 17:06:25 -07:00
Cheng Su	aeb2346804	[AIR] Replace references of `to_torch` with `iter_torch_batches` (#27574 )	2022-08-07 20:14:12 -07:00
Eric Liang	f7ae8923f6	[docs] Reorganize the tensor data support docs; general editing (#26952 ) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing	2022-08-01 17:31:41 -07:00
matthewdeng	3ea80f6aa1	[data] set iter_batches default batch_size (#26955 ) Why are these changes needed? Resubmitting #26869. This PR was reverted due to failing tests; however, those failures were actually due to a dependency: #26950	2022-07-25 08:34:25 -07:00
Kai Fricke	8fe439998e	[air/tuner/docs] Update docs for Tuner() API 1: RSTs, docs, move reuse_actors (#26930 ) Signed-off-by: Kai Fricke coding@kaifricke.com Why are these changes needed? Splitting up #26884: This PR includes changes to use Tuner() instead of tune.run() for most docs files (rst and py), and a change to move reuse_actors to the TuneConfig	2022-07-24 07:45:24 -07:00
Eric Liang	d692a55018	[data] Make lazy mode non-experimental (#26934 )	2022-07-23 21:28:31 -07:00
matthewdeng	bcec60d898	Revert "[data] set iter_batches default batch_size #26869 " (#26938 ) This reverts commit `b048c6f659`.	2022-07-23 17:46:45 -07:00
matthewdeng	b048c6f659	[data] set iter_batches default batch_size #26869 Why are these changes needed? Consumers (e.g. Train) may expect generated batches to be of the same size. Prior to this change, the default behavior would be for each batch to be one block, which may be of different sizes. Changes Set default batch_size to 256. This was chosen to be a sensible default for training workloads, which is intentionally different from the existing default batch_size value for Dataset.map_batches. Update docs for Dataset.iter_batches, Dataset.map_batches, and DatasetPipeline.iter_batches to be consistent. Updated tests and examples to explicitly pass in batch_size=None as these tests were intentionally testing block iteration, and there are other tests that test explicit batch sizes.	2022-07-23 13:44:53 -07:00
Eric Liang	63a6c1dfac	[docs] Cleanup the Datasets key concept docs (#26908 ) Clean up the Datasets key concept doc to be suitable for consumption by a beginner level user and improving the diagrams.	2022-07-22 23:30:54 -07:00
Chen Shen	b20f5f51df	[Air][Data] Don't promote locality_hints for split (#26647 ) Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See #26641 for more context	2022-07-17 22:18:30 -07:00
Eric Liang	400330e9c0	[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634 )	2022-07-16 21:55:51 -07:00
Cheng Su	4e674b6ad3	[Datasets] Update docs for drop_columns and fix typos (#26317 ) We added drop_columns() API to datasets in #26200, so updating documentation here to use the new API - doc/source/data/examples/nyc_taxi_basic_processing.ipynb. In addition, fixing some minor typos after proofreading the datasets documentation.	2022-07-07 17:17:33 -07:00
Clark Zinzow	1701b923bc	[Datasets] [Tensor Story - 2/2] Add `"numpy"` batch format for batch mapping and batch consumption. (#24870 ) This PR adds a NumPy "numpy" batch format for batch transformations and batch consumption that works with all block types. See #24811.	2022-06-17 16:01:02 -07:00
Jian Xiao	50c854b1ad	Fix hyperlink in rst doc (#25427 ) Hyperlink not working Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>	2022-06-08 13:46:23 -07:00
Jian Xiao	6589a4f8cb	[Datasets][UX Assessment] Add a section on how to write UDFs in Datasets (#25338 ) The Datasets UX assessment showed that users had difficulties in writing UDFs: what's input/output types, how to write the function etc. Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>	2022-06-02 20:00:50 -07:00
Eric Liang	51b295ad74	[docs] Improve Tune + Datasets documentation (#25389 )	2022-06-01 21:52:32 -07:00
Jian Xiao	ad842ec9ab	Revamp the Transforming Datasets user guide (#25033 )	2022-05-20 19:25:06 -07:00
Jian Xiao	e5838c4700	Fix range_arrow(), which is replaced by range_table() (#25036 )	2022-05-20 19:24:49 -07:00
Jian Xiao	44fd7fd1d0	Revamp the Saving Datasets user guide (#24987 )	2022-05-19 15:40:12 -07:00
Clark Zinzow	399334d53c	[Datasets] Overhaul "Accessing Datasets" feature guide. (#24963 ) This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()).	2022-05-19 12:50:00 -07:00
Clark Zinzow	0b6505e8c6	[Datasets] Miscellaneous GA docs P0s. (#24891 ) This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely: - Documents Datasets resource allocation model. - De-emphasizes global/windowed shuffling. - Documents lazy execution mode, and expands our execution model docs in general.	2022-05-18 16:17:48 -07:00
Jian Xiao	9fe4dba4ad	Revamp the Getting Started page for Dataset (#24860 ) This is part of the Dataset GA doc fix effort to update/improve the documentation. This PR revamps the Getting Started page. What are the changes: - Focus on basic/core features that are bread-and-butter for users, leave the advanced features out - Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API) - Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case) - Use the same data example throughout to make it context-switch free - Use runnable code rather than faked - Reference to the code from doc, instead of inlining them in the doc Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-05-18 13:46:23 -07:00
Clark Zinzow	4444150c29	[Datasets] Overhaul of "Creating Datasets" feature guide. (#24831 ) This PR is a general overhaul of the "Creating Datasets" feature guide, providing complete coverage of all (public) dataset creation APIs and highlighting features and quirks of the individual APIs, data modalities, storage backends, etc. In order to keep the page from getting too long and keeping it easy to navigate, tabbed views are used heavily.	2022-05-17 16:23:42 -07:00
Max Pumperla	29d94a2211	[docs] sphinx gallery removal, migrate to ipynb (#22467 )	2022-02-19 01:19:07 -08:00

25 commits