Commit graph

139 commits

Author SHA1 Message Date
Eric Liang
13d4ad6100
[data] Preserve epoch by default when using rewindow() (#19359) 2021-10-14 09:17:36 -07:00
Eric Liang
430a5f4a21
[doc] Bump dataset to beta for 1.8 and add backlink to SGD (#19332) 2021-10-12 18:32:29 -07:00
Amog Kamsetty
f6f2435b91
[SGD] Sgd v2 Dataset Integration (#17626)
* wip

* wip

* wip

* draft

* disable tf autosharding

* wip

* wip

* wip

* wip

* add example

* wip

* wip

* wip

* use dataset.split

* add unit tests

* add linear example

* concatenate tensors and fix example

* WIP tune example

* add tensorflow example

* wip

* random_shuffle_each_window

* fault tolerance test

* GPU, examples, CI

* formatting

* fix

* Update python/ray/util/sgd/v2/tests/test_trainer.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* wip

* type hints

* wip

* update user guide

* fix

* fix immediate issues

* update example

* update

* fix tune gpu test

* fix resources for smoke test - 1 CPU for dataset tasks

* update tests, docs, examples

* Apply suggestions from code review

Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>

* address comments

* add warning

* fix tests

* minor doc updates

* update example in doc

* configure tests

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>

* Update python/ray/data/dataset.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* fix docstring

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-10-12 14:03:10 -07:00
Eric Liang
0ab6749602
Support iter_epochs for Datasets (#19217) 2021-10-12 11:05:00 -07:00
Chen Shen
c740aae54c
[Core][Dataset] adding example for large scale data ingestion (#18998) 2021-10-11 15:37:09 -07:00
Eric Liang
86cbe3e833
[data] Add support for repeating and re-windowing a DatasetPipeline (#19091) 2021-10-06 20:13:43 -07:00
Jiajun Yao
7ccf737f97
Add compatible dask version for ray 1.6.0 and 1.7.0 (#19080) 2021-10-05 10:23:06 +09:00
Eric Liang
032a420ee6
Rename Dataset.pipeline to Dataset.window (#19050) 2021-10-01 19:55:29 -07:00
Clark Zinzow
d22f838795
[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992) 2021-10-01 13:08:25 -07:00
Alex Wu
5709c6501b
[dataset][usability] Dataset dependencies (#18346) 2021-09-29 17:29:31 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks (#18867) 2021-09-27 16:24:09 -07:00
Eric Liang
4d2065352b
Increase dataset read parallelism by default (#18420) 2021-09-09 15:07:49 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. (#18301) 2021-09-08 10:09:01 -07:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs (#18357) 2021-09-06 01:39:08 -07:00
Eric Liang
7dcae690b9
Mark datasets as still in alpha for now (#18321) 2021-09-02 17:07:33 -07:00
Wesley Gifford
6133a561e9
Dataset from modin (#18122) 2021-08-31 11:19:35 -07:00
Eric Liang
95b5ad12ba
Initial version of workflow documentation (#18138) 2021-08-27 16:20:48 -07:00
Clark Zinzow
c0598de82a
[Datasets] Port write APIs to use file-based datasources. (#18135) 2021-08-27 15:24:54 -07:00
Clark Zinzow
aee7ba2510
[Datasets] Add from_numpy() and to_numpy() APIs (#18146) 2021-08-27 13:33:11 -07:00
Eric Liang
e1f69ceb5e
Add documentation for DatasetPipeline.from_iterable (#18106) 2021-08-25 22:31:23 -07:00
Eric Liang
71b3183038
Add implicit init note to Ray docs & dataset version note (#17751) 2021-08-11 13:13:22 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental (#17560) 2021-08-04 13:31:10 -07:00
Chris K. W
a33cbec12a
[client][docs] update docs for new client support in init (#17333)
* start

* check formatting

* undo changes from base branch

* Client builder API docs

* indent

* 8

* minor fixes

* absolute path to runtime env docs

* fix runtime_env link

* Update worker.init docs

* drop clientbuilder docs, link to 1.4.1 docs instead. Specify local:// behavior when address passed

* add debug info for ray.init("local")

* local:// attaches a driver directly

* update ray.init return wording

* remote init.connect() from example

* drop local:// docs, add section on when to use ray client

* link to 1.4.1 docs in code example instead of mentioning clientbuilder

* fix backticks, doc mentions of ray.util.connect

* remove ray.util.connect mentions from examples and comments

* update tune example

* wording

* localhost:<port> also works if you're on the head node

* add quotes

* drop mentions of ray client from ray.init docstring

* local->remote

* fix section ref

* update ray start output

* fix section link

* try to fix doc again

* fix link wording

* drop local:// from docs and special handling from code

* update ray start message

* lint

* doc lint

* remove local:// codepath

* remove 'internal_config'

* Update doc/source/cluster/ray-client.rst

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* doc suggestion

* Update doc/source/cluster/ray-client.rst

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-04 05:31:44 +03:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized (#17492) 2021-08-02 11:48:48 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset (#17439) 2021-08-01 22:45:21 -07:00
Eric Liang
cd13059691
[dataset] Implement random_shuffle() and split(equal=True) (#17448) 2021-07-30 09:51:21 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs (#17309) 2021-07-28 21:12:01 -07:00
Jiao
9b6be6f1c8
update dask compatibility for 1.5.0 (#17302)
* update dask compatibility for 1.5.0

* change to right file

* add pip install pytest

Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-23 17:31:42 -07:00
Eric Liang
df7fe8dd6d
[data] Cleanup Block type by dropping Generic[T] (#17276)
* wip

* update

* update

* quotes
2021-07-23 09:23:06 -07:00
Clark Zinzow
b5194ca9f9
Add imports to docs examples to make the code more runnable. (#17240) 2021-07-21 11:18:45 -07:00
Eric Liang
fabba96fad
Re-merge large function def, skipping test failing on Windows (#17191) 2021-07-19 18:03:26 -07:00
architkulkarni
4069686e0f
Revert "Improve error message for oversized function (#17133)" (#17184)
This reverts commit 3e53619d64.
2021-07-19 09:28:33 -07:00
Eric Liang
3e53619d64
Improve error message for oversized function (#17133) 2021-07-17 11:04:05 -07:00
Eric Liang
94f17ec099
[RFC] API stability annotations (#17100) 2021-07-16 17:09:20 -07:00
Eric Liang
26a286655b
Add link to datasets preview docs 2021-07-16 12:31:52 -07:00
SangBin Cho
246f80961e
Dask on Ray version documentation update (#16905)
* In progress

* done

* Fix the table format

* completed

* done

* Fix lint
2021-07-16 10:10:26 -07:00
Eric Liang
f03b43c532
[dataset] Support callable classes to simplify state initialization (#17136) 2021-07-15 23:06:14 -07:00
Eric Liang
3d764d7b4b
[data] Fix the ObjectRef type in the dataset docs (#17111)
* fix reft

* remove exp

* fix
2021-07-15 09:50:37 -07:00
Eric Liang
38bddc3f2b
First cut at dataset documentation (#16956) 2021-07-14 23:27:13 -07:00