hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
SangBin Cho	b7ab0555c4	[Link Check] Fix the broken link check from the AIR doc (#27632 ) The original link doesn't exist. https://docs.ray.io/en/master/_images/air-ecosystem.svg I fixed it by linking the raw github file link. This should have the exactly same flow as before. I tried finding a link to this image file, but I couldn't. I also couldn't find an easy way to add only a link (without embedding an image). Please lmk if you prefer other option	2022-08-08 06:36:04 -07:00
Cheng Su	aeb2346804	[AIR] Replace references of `to_torch` with `iter_torch_batches` (#27574 )	2022-08-07 20:14:12 -07:00
Eric Liang	9b467e3954	[docs] Improve the "Why Ray" and "Why AIR" sections of the docs (#27480 )	2022-08-05 18:42:45 -07:00
Richard Liaw	4629a3a649	[air/docs] Update Trainer documentation (#27481 ) Co-authored-by: xwjiang2010 <xwjiang2010@gmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-08-05 11:21:19 -07:00
Bill Chambers	73bc572405	[AIR/docs] Adding Source Libraries (#27518 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-04 15:56:40 -07:00
Bill Chambers	19dc19a2c5	Fix Ray Air Docs Install (#27501 )	2022-08-04 10:47:10 -07:00
Richard Liaw	b2cd34cc5c	[air] Remove checkpoint user guide and update key concepts and docstring (#27455 )	2022-08-04 08:55:26 -07:00
xwjiang2010	8d5c07b781	[air/train/docs] Add trainer user guide and update trainer docs (#27389 ) This PR adds a user guide to AIR for using Ray Train. It provides a high level overview of the trainers and removes redundant sections. The main file to review is here: doc/source/ray-air/trainer.rst. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-08-04 13:59:50 +01:00
Eric Liang	67a306f92f	[docs] Update colors and styling of ray diagrams (#27474 )	2022-08-03 16:49:25 -07:00
Eric Liang	340f0960d6	[docs] Improve the AIR introductory page (#27347 )	2022-08-03 16:04:04 -07:00
xwjiang2010	ff2b728e9a	[air] add tuner user guide (#26837 ) Co-authored-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-03 09:43:42 -07:00
Jules S. Damji	4045ba4841	[DOC Ray AIR] minor editorial tweaks for clarity and usage (#27128 ) Co-authored-by: Jules Damji <jules@anyscale.com>	2022-08-01 21:09:04 -07:00
Dmitri Gekhtman	6efca71c35	[docs][kubernetes] XGBoost ML example (#27313 ) Adds a guide on running an XGBoost-Ray workload using KubeRay. Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>	2022-08-01 19:30:41 -07:00
Eric Liang	f7ae8923f6	[docs] Reorganize the tensor data support docs; general editing (#26952 ) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing	2022-08-01 17:31:41 -07:00
Clark Zinzow	df124d0ad5	[AIR - Datasets] Hide tensor extension from UDFs. (#27019 ) We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues: 1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases: a. Heterogeneous-shaped (i.e. ragged) tensors b. Struct arrays 2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails. Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column. 3. The tensor extension cast fallback warning spams the logs. This PR: 1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3). 2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1). 3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc.	2022-07-28 10:37:45 -07:00
Kai Fricke	3924a4b7cc	[air/train] Rename BaseWorkerMixin, only log info torch loop for rank 0 (#27098 ) This PR - only prints train_loop info strings (e.g. `train_loop_utils.py:298 -- Moving model to device: cpu`) for rank 0 workers for torch - renames `BaseWorkerMixin` to `RayTrainWorker` as the name comes up often in output and is more meaningful Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-07-27 20:11:59 +01:00
matthewdeng	113c4d7fab	[air][data] move train_test_split to ray.data.Dataset (#27065 )	2022-07-27 09:53:37 -07:00
Balaji Veeramani	89f7f2a567	[Datasets] Add `size` parameter to `ImageFolderDatasource` (#26975 ) If you read a folder with differently-sized images, `ImageFolderDatasource` errors. This PR fixes the issue by resizing images to a user-specified size.	2022-07-26 14:57:38 -07:00
Balaji Veeramani	8bc836d9fb	[AIR] Remove `CustomStatefulPreprocessor` (#26981 )	2022-07-26 10:10:57 -07:00
Balaji Veeramani	55988992b9	[AIR] Rename `limit` parameter as `max_categories` (#26977 )	2022-07-26 10:10:40 -07:00
Jules S. Damji	193e824bc1	[AIR DOC] minor tweaks to checkpoint user guide for clarity and consistency subheadings (#26937 ) Co-authored-by: Jules Damji <jules@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-25 14:21:29 -07:00
Jiao	5315f1e643	[AIR] Enable other notebooks previously marked with # REGRESSION (#26896 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-25 13:40:21 -07:00
matthewdeng	df638b3f0f	[Datasets] Automatically cast tensor columns when building Pandas blocks. (#26924 ) This PR just applies the changes from the following PRs: [Datasets] Automatically cast tensor columns when building Pandas blocks. #26684 reverted by Revert "[Datasets] Automatically cast tensor columns when building Pandas blocks." #26921 [AIR - Datasets] Fix TensorDtype construction from string and fix example. #26904 This fixes the test failures introduced in the originally reverted PRs.	2022-07-25 12:12:10 -07:00
Eric Liang	008eecfbff	[docs] Update the AIR data ingest guide (#26909 )	2022-07-24 09:59:29 -07:00
Kai Fricke	1f32cb95db	[air/tune] Add top-level imports for Tuner, TuneConfig, move CheckpointConfig (#26882 )	2022-07-22 20:17:06 -07:00
Eric Liang	36c46e9686	[docs] Improve AIR table of contents titles (#26858 )	2022-07-22 17:17:49 -07:00
Clark Zinzow	a29baf93c8	[Datasets] Add `.iter_torch_batches()` and `.iter_tf_batches()` APIs. (#26689 ) This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.	2022-07-22 10:09:36 -07:00
Eric Liang	9272bcbbca	[docs] Add ecosystem map to AIR guide (#26859 )	2022-07-21 19:06:47 -07:00
Jiao	db027d86af	[P0][AIR] Fix train to serve notebooks (#26821 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-07-21 18:04:13 -07:00
Jules S. Damji	6db2536971	[RayAIR] Minor tweaks to the why ray air for clarity (#26680 )	2022-07-21 10:21:26 -07:00
Balaji Veeramani	ac1d21027d	[AIR] Add framework-specific checkpoints (#26777 )	2022-07-20 19:33:27 -07:00
Richard Liaw	9f0d35b97c	[air/docs] add tensorflow benchmarks into table (#26800 )	2022-07-20 17:12:40 -07:00
Eric Liang	d6f29eb9ca	[docs] Mark pipelined prediction as experimental for now (#26792 )	2022-07-20 15:31:19 -07:00
xwjiang2010	e7957f4a3e	[air] update offline/online rl example and enable them. (#26786 )	2022-07-20 14:06:03 -07:00
Richard Liaw	6563c2762d	[air] add pytorch benchmark number (#26719 )	2022-07-19 09:51:13 -07:00
Richard Liaw	7e62e1187c	[air/benchmark] Torch benchmarks for 4x4 (#26692 ) Add benchmark data for 4x4 GPU setup. Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-07-19 17:06:37 +01:00
Sumanth Ratna	759966781f	[air] Allow users to use instances of `ScalingConfig` (#25712 ) Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-18 15:46:58 -07:00
matthewdeng	6670708010	[air] add placement group max CPU to data benchmark (#26649 ) Set experimental `_max_cpu_fraction_per_node` to prevent deadlock. This should technically be a no-op with the SPREAD strategy.	2022-07-18 10:34:40 -07:00
Jiao	98a07920d3	[AIR][CUJ] Make distributing training benchmark at silver tier (#26640 )	2022-07-17 22:07:09 -07:00
Jules S. Damji	55368402ee	added summary why and when to use bulk vs streaming data ingest (#26637 )	2022-07-17 18:46:58 -07:00
Clark Zinzow	864af14f41	[Datasets] [Local Shuffle - 1/N] Add local shuffling option. (#26094 ) Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Matthew Deng <matt@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-17 16:21:14 -07:00
Eric Liang	400330e9c0	[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634 )	2022-07-16 21:55:51 -07:00
Amog Kamsetty	3a345a470c	[AIR/Docs] Add Predictor Docs (#25833 )	2022-07-16 21:14:21 -07:00
Jiao	77e2ef2eb6	[AIR] Update Torch benchmarks with documentation (#26631 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-16 17:58:21 -07:00
Eric Liang	0855bcb77e	[air] Use SPREAD strategy by default and don't special case it in benchmarks (#26633 )	2022-07-16 17:37:06 -07:00
Antoni Baum	fb6f3cf708	[AIR/Docs] Small improvements to Train user guide (#26577 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-07-16 16:51:17 -07:00
Eric Liang	6217138eb0	[docs] Move AIR benchmarks to top level (#26632 )	2022-07-16 15:34:31 -07:00
Richard Liaw	799311b2f7	[air/docs] update examples to remove pandas again (#26598 )	2022-07-16 08:40:44 -07:00
matthewdeng	e3a096f412	[air] add bulk ingest benchmarks (#26618 )	2022-07-15 22:01:23 -07:00
Richard Liaw	5ad4e75831	[air] Add initial benchmark section (#26608 )	2022-07-15 15:33:48 -07:00

1 2 3

116 commits