Eric Liang
13d4ad6100
[data] Preserve epoch by default when using rewindow() ( #19359 )
2021-10-14 09:17:36 -07:00
Eric Liang
430a5f4a21
[doc] Bump dataset to beta for 1.8 and add backlink to SGD ( #19332 )
2021-10-12 18:32:29 -07:00
Amog Kamsetty
f6f2435b91
[SGD] Sgd v2 Dataset Integration ( #17626 )
...
* wip
* wip
* wip
* draft
* disable tf autosharding
* wip
* wip
* wip
* wip
* add example
* wip
* wip
* wip
* use dataset.split
* add unit tests
* add linear example
* concatenate tensors and fix example
* WIP tune example
* add tensorflow example
* wip
* random_shuffle_each_window
* fault tolerance test
* GPU, examples, CI
* formatting
* fix
* Update python/ray/util/sgd/v2/tests/test_trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* wip
* type hints
* wip
* update user guide
* fix
* fix immediate issues
* update example
* update
* fix tune gpu test
* fix resources for smoke test - 1 CPU for dataset tasks
* update tests, docs, examples
* Apply suggestions from code review
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
* address comments
* add warning
* fix tests
* minor doc updates
* update example in doc
* configure tests
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
* Update python/ray/data/dataset.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* fix docstring
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-10-12 14:03:10 -07:00
Eric Liang
0ab6749602
Support iter_epochs for Datasets ( #19217 )
2021-10-12 11:05:00 -07:00
Chen Shen
c740aae54c
[Core][Dataset] adding example for large scale data ingestion ( #18998 )
2021-10-11 15:37:09 -07:00
Eric Liang
86cbe3e833
[data] Add support for repeating and re-windowing a DatasetPipeline ( #19091 )
2021-10-06 20:13:43 -07:00
Jiajun Yao
7ccf737f97
Add compatible dask version for ray 1.6.0 and 1.7.0 ( #19080 )
2021-10-05 10:23:06 +09:00
Eric Liang
032a420ee6
Rename Dataset.pipeline to Dataset.window ( #19050 )
2021-10-01 19:55:29 -07:00
Clark Zinzow
d22f838795
[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. ( #18992 )
2021-10-01 13:08:25 -07:00
Alex Wu
5709c6501b
[dataset][usability] Dataset dependencies ( #18346 )
2021-09-29 17:29:31 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks ( #18867 )
2021-09-27 16:24:09 -07:00
Eric Liang
4d2065352b
Increase dataset read parallelism by default ( #18420 )
2021-09-09 15:07:49 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. ( #18301 )
2021-09-08 10:09:01 -07:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs ( #18357 )
2021-09-06 01:39:08 -07:00
Eric Liang
7dcae690b9
Mark datasets as still in alpha for now ( #18321 )
2021-09-02 17:07:33 -07:00
Wesley Gifford
6133a561e9
Dataset from modin ( #18122 )
2021-08-31 11:19:35 -07:00
Eric Liang
95b5ad12ba
Initial version of workflow documentation ( #18138 )
2021-08-27 16:20:48 -07:00
Clark Zinzow
c0598de82a
[Datasets] Port write APIs to use file-based datasources. ( #18135 )
2021-08-27 15:24:54 -07:00
Clark Zinzow
aee7ba2510
[Datasets] Add from_numpy() and to_numpy() APIs ( #18146 )
2021-08-27 13:33:11 -07:00
Eric Liang
e1f69ceb5e
Add documentation for DatasetPipeline.from_iterable ( #18106 )
2021-08-25 22:31:23 -07:00
Eric Liang
71b3183038
Add implicit init note to Ray docs & dataset version note ( #17751 )
2021-08-11 13:13:22 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental ( #17560 )
2021-08-04 13:31:10 -07:00
Chris K. W
a33cbec12a
[client][docs] update docs for new client support in init ( #17333 )
...
* start
* check formatting
* undo changes from base branch
* Client builder API docs
* indent
* 8
* minor fixes
* absolute path to runtime env docs
* fix runtime_env link
* Update worker.init docs
* drop clientbuilder docs, link to 1.4.1 docs instead. Specify local:// behavior when address passed
* add debug info for ray.init("local")
* local:// attaches a driver directly
* update ray.init return wording
* remote init.connect() from example
* drop local:// docs, add section on when to use ray client
* link to 1.4.1 docs in code example instead of mentioning clientbuilder
* fix backticks, doc mentions of ray.util.connect
* remove ray.util.connect mentions from examples and comments
* update tune example
* wording
* localhost:<port> also works if you're on the head node
* add quotes
* drop mentions of ray client from ray.init docstring
* local->remote
* fix section ref
* update ray start output
* fix section link
* try to fix doc again
* fix link wording
* drop local:// from docs and special handling from code
* update ray start message
* lint
* doc lint
* remove local:// codepath
* remove 'internal_config'
* Update doc/source/cluster/ray-client.rst
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* doc suggestion
* Update doc/source/cluster/ray-client.rst
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-04 05:31:44 +03:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized ( #17492 )
2021-08-02 11:48:48 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset ( #17439 )
2021-08-01 22:45:21 -07:00
Eric Liang
cd13059691
[dataset] Implement random_shuffle() and split(equal=True) ( #17448 )
2021-07-30 09:51:21 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs ( #17309 )
2021-07-28 21:12:01 -07:00
Jiao
9b6be6f1c8
update dask compatibility for 1.5.0 ( #17302 )
...
* update dask compatibility for 1.5.0
* change to right file
* add pip install pytest
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-23 17:31:42 -07:00
Eric Liang
df7fe8dd6d
[data] Cleanup Block type by dropping Generic[T] ( #17276 )
...
* wip
* update
* update
* quotes
2021-07-23 09:23:06 -07:00
Clark Zinzow
b5194ca9f9
Add imports to docs examples to make the code more runnable. ( #17240 )
2021-07-21 11:18:45 -07:00
Eric Liang
fabba96fad
Re-merge large function def, skipping test failing on Windows ( #17191 )
2021-07-19 18:03:26 -07:00
architkulkarni
4069686e0f
Revert "Improve error message for oversized function ( #17133 )" ( #17184 )
...
This reverts commit 3e53619d64
.
2021-07-19 09:28:33 -07:00
Eric Liang
3e53619d64
Improve error message for oversized function ( #17133 )
2021-07-17 11:04:05 -07:00
Eric Liang
94f17ec099
[RFC] API stability annotations ( #17100 )
2021-07-16 17:09:20 -07:00
Eric Liang
26a286655b
Add link to datasets preview docs
2021-07-16 12:31:52 -07:00
SangBin Cho
246f80961e
Dask on Ray version documentation update ( #16905 )
...
* In progress
* done
* Fix the table format
* completed
* done
* Fix lint
2021-07-16 10:10:26 -07:00
Eric Liang
f03b43c532
[dataset] Support callable classes to simplify state initialization ( #17136 )
2021-07-15 23:06:14 -07:00
Eric Liang
3d764d7b4b
[data] Fix the ObjectRef type in the dataset docs ( #17111 )
...
* fix reft
* remove exp
* fix
2021-07-15 09:50:37 -07:00
Eric Liang
38bddc3f2b
First cut at dataset documentation ( #16956 )
2021-07-14 23:27:13 -07:00