hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
Eric Liang	015181ab9a	Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.	2022-03-17 15:01:12 -07:00
Clark Zinzow	53c4c7b1be	[Datasets] Expose `TableRow` as public API; minimize copies/type conversions on row-based ops. (#22305 ) This PR properly exposes `TableRow` as a public API (API docs + the "Public" tag), since it's already exposed to the user in our row-based ops. In addition, the following changes are made: 1. During row-based ops, we also choose a batch format that lines up with the current dataset format in order to eliminate unnecessary copies and type conversions. 2. `TableRow` now derives from `collections.abc.Mapping`, which lets `TableRow` better interop with code expecting a mapping, and includes a few helpful mixins so we only have to implement `__getitem__`, `__iter__`, and `__len__`.	2022-02-14 12:56:17 -08:00
Clark Zinzow	fb0d6e6b0b	[Datasets] [Docs] Datasets library branding + positioning tweaks (#22067 )	2022-02-05 16:59:34 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Max Pumperla	b34099e764	[docs] landing page (fixes #21750 ) (#21859 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-01-26 17:14:25 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Jiajun Yao	4fc5b11c68	Simple block dataset groupBy (#19435 )	2021-10-19 19:53:13 -07:00
Amog Kamsetty	f6f2435b91	[SGD] Sgd v2 Dataset Integration (#17626 ) * wip * wip * wip * draft * disable tf autosharding * wip * wip * wip * wip * add example * wip * wip * wip * use dataset.split * add unit tests * add linear example * concatenate tensors and fix example * WIP tune example * add tensorflow example * wip * random_shuffle_each_window * fault tolerance test * GPU, examples, CI * formatting * fix * Update python/ray/util/sgd/v2/tests/test_trainer.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * wip * type hints * wip * update user guide * fix * fix immediate issues * update example * update * fix tune gpu test * fix resources for smoke test - 1 CPU for dataset tasks * update tests, docs, examples * Apply suggestions from code review Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * address comments * add warning * fix tests * minor doc updates * update example in doc * configure tests * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * Update python/ray/data/dataset.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * fix docstring Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: matthewdeng <matt@anyscale.com> Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-10-12 14:03:10 -07:00
Clark Zinzow	d22f838795	[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992 )	2021-10-01 13:08:25 -07:00
Clark Zinzow	b30c41759d	[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. (#18301 )	2021-09-08 10:09:01 -07:00
Clark Zinzow	c0598de82a	[Datasets] Port write APIs to use file-based datasources. (#18135 )	2021-08-27 15:24:54 -07:00
Clark Zinzow	aee7ba2510	[Datasets] Add from_numpy() and to_numpy() APIs (#18146 )	2021-08-27 13:33:11 -07:00
Eric Liang	d4f9d3620e	Move ray.data out of experimental (#17560 )	2021-08-04 13:31:10 -07:00
Eric Liang	e812691909	Support top-level tensor values in dataset (#17439 )	2021-08-01 22:45:21 -07:00
Eric Liang	cd13059691	[dataset] Implement random_shuffle() and split(equal=True) (#17448 )	2021-07-30 09:51:21 -07:00
Eric Liang	7ed62ea0ad	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
Eric Liang	3d764d7b4b	[data] Fix the ObjectRef type in the dataset docs (#17111 ) * fix reft * remove exp * fix	2021-07-15 09:50:37 -07:00
Eric Liang	38bddc3f2b	First cut at dataset documentation (#16956 )	2021-07-14 23:27:13 -07:00

20 commits