Kai Fricke
c680837289
[air/train/release/2.0.0] Rename BaseWorkerMixin, only log info torch loop for rank 0 ( #27228 )
...
Following up from #27098 , this PR renames the baseworker mixin and declutters training output by only logging for rank 0 actors.
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-29 09:04:02 +01:00
Clark Zinzow
22ca30cd92
[Cherry-pick] [AIR - Datasets] Hide tensor extension from UDFs. ( #27196 )
2022-07-28 13:59:19 -07:00
scv119
3edfc78ee2
update version number to 2.0.0rc0
2022-07-27 18:43:27 +00:00
matthewdeng
113c4d7fab
[air][data] move train_test_split to ray.data.Dataset ( #27065 )
2022-07-27 09:53:37 -07:00
Balaji Veeramani
89f7f2a567
[Datasets] Add size
parameter to ImageFolderDatasource
( #26975 )
...
If you read a folder with differently-sized images, `ImageFolderDatasource` errors. This PR fixes the issue by resizing images to a user-specified size.
2022-07-26 14:57:38 -07:00
Balaji Veeramani
8bc836d9fb
[AIR] Remove CustomStatefulPreprocessor
( #26981 )
2022-07-26 10:10:57 -07:00
Balaji Veeramani
55988992b9
[AIR] Rename limit
parameter as max_categories
( #26977 )
2022-07-26 10:10:40 -07:00
Jules S. Damji
193e824bc1
[AIR DOC] minor tweaks to checkpoint user guide for clarity and consistency subheadings ( #26937 )
...
Co-authored-by: Jules Damji <jules@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-25 14:21:29 -07:00
Jiao
5315f1e643
[AIR] Enable other notebooks previously marked with # REGRESSION ( #26896 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-25 13:40:21 -07:00
matthewdeng
df638b3f0f
[Datasets] Automatically cast tensor columns when building Pandas blocks. ( #26924 )
...
This PR just applies the changes from the following PRs:
[Datasets] Automatically cast tensor columns when building Pandas blocks. #26684
reverted by Revert "[Datasets] Automatically cast tensor columns when building Pandas blocks." #26921
[AIR - Datasets] Fix TensorDtype construction from string and fix example. #26904
This fixes the test failures introduced in the originally reverted PRs.
2022-07-25 12:12:10 -07:00
Eric Liang
008eecfbff
[docs] Update the AIR data ingest guide ( #26909 )
2022-07-24 09:59:29 -07:00
Kai Fricke
1f32cb95db
[air/tune] Add top-level imports for Tuner, TuneConfig, move CheckpointConfig ( #26882 )
2022-07-22 20:17:06 -07:00
Eric Liang
36c46e9686
[docs] Improve AIR table of contents titles ( #26858 )
2022-07-22 17:17:49 -07:00
Clark Zinzow
a29baf93c8
[Datasets] Add .iter_torch_batches()
and .iter_tf_batches()
APIs. ( #26689 )
...
This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.
2022-07-22 10:09:36 -07:00
Eric Liang
9272bcbbca
[docs] Add ecosystem map to AIR guide ( #26859 )
2022-07-21 19:06:47 -07:00
Jiao
db027d86af
[P0][AIR] Fix train to serve notebooks ( #26821 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2022-07-21 18:04:13 -07:00
Jules S. Damji
6db2536971
[RayAIR] Minor tweaks to the why ray air for clarity ( #26680 )
2022-07-21 10:21:26 -07:00
Balaji Veeramani
ac1d21027d
[AIR] Add framework-specific checkpoints ( #26777 )
2022-07-20 19:33:27 -07:00
Richard Liaw
9f0d35b97c
[air/docs] add tensorflow benchmarks into table ( #26800 )
2022-07-20 17:12:40 -07:00
Eric Liang
d6f29eb9ca
[docs] Mark pipelined prediction as experimental for now ( #26792 )
2022-07-20 15:31:19 -07:00
xwjiang2010
e7957f4a3e
[air] update offline/online rl example and enable them. ( #26786 )
2022-07-20 14:06:03 -07:00
Richard Liaw
6563c2762d
[air] add pytorch benchmark number ( #26719 )
2022-07-19 09:51:13 -07:00
Richard Liaw
7e62e1187c
[air/benchmark] Torch benchmarks for 4x4 ( #26692 )
...
Add benchmark data for 4x4 GPU setup.
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-19 17:06:37 +01:00
Sumanth Ratna
759966781f
[air] Allow users to use instances of ScalingConfig
( #25712 )
...
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-07-18 15:46:58 -07:00
matthewdeng
6670708010
[air] add placement group max CPU to data benchmark ( #26649 )
...
Set experimental `_max_cpu_fraction_per_node` to prevent deadlock.
This should technically be a no-op with the SPREAD strategy.
2022-07-18 10:34:40 -07:00
Jiao
98a07920d3
[AIR][CUJ] Make distributing training benchmark at silver tier ( #26640 )
2022-07-17 22:07:09 -07:00
Jules S. Damji
55368402ee
added summary why and when to use bulk vs streaming data ingest ( #26637 )
2022-07-17 18:46:58 -07:00
Clark Zinzow
864af14f41
[Datasets] [Local Shuffle - 1/N] Add local shuffling option. ( #26094 )
...
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Matthew Deng <matt@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-17 16:21:14 -07:00
Eric Liang
400330e9c0
[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation ( #26634 )
2022-07-16 21:55:51 -07:00
Amog Kamsetty
3a345a470c
[AIR/Docs] Add Predictor Docs ( #25833 )
2022-07-16 21:14:21 -07:00
Jiao
77e2ef2eb6
[AIR] Update Torch benchmarks with documentation ( #26631 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-16 17:58:21 -07:00
Eric Liang
0855bcb77e
[air] Use SPREAD strategy by default and don't special case it in benchmarks ( #26633 )
2022-07-16 17:37:06 -07:00
Antoni Baum
fb6f3cf708
[AIR/Docs] Small improvements to Train user guide ( #26577 )
...
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-07-16 16:51:17 -07:00
Eric Liang
6217138eb0
[docs] Move AIR benchmarks to top level ( #26632 )
2022-07-16 15:34:31 -07:00
Richard Liaw
799311b2f7
[air/docs] update examples to remove pandas again ( #26598 )
2022-07-16 08:40:44 -07:00
matthewdeng
e3a096f412
[air] add bulk ingest benchmarks ( #26618 )
2022-07-15 22:01:23 -07:00
Richard Liaw
5ad4e75831
[air] Add initial benchmark section ( #26608 )
2022-07-15 15:33:48 -07:00
Jiao
647e12b6c7
[AIR] Fix convert_existing_pytorch_code_to_ray_air notebook ( #26523 )
2022-07-14 14:30:55 -07:00
Amog Kamsetty
6595bd6e2d
[AIR] Introduce better scoring API for BatchPredictor
( #26451 )
...
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
As discussed offline, allow configurability for feature columns and keep columns in BatchPredictor for better scoring UX on test datasets.
2022-07-14 11:26:12 -07:00
Richard Liaw
a0ce3c111b
[air/data] Concatenator preprocessor ( #26526 )
2022-07-14 10:26:14 -07:00
Eric Liang
5f18c67ba3
Fix LINT ( #26554 )
...
Signed-off-by: Eric Liang <ekhliang@gmail.com>
2022-07-13 23:28:02 -07:00
Jiao
15dbc0362a
[AIR][Docs] Fix torch_image_example ( #26453 )
2022-07-13 21:59:24 -07:00
Eric Liang
31c8c908f9
[docs] Improve AIR API ref organization ( #26530 )
2022-07-13 18:05:17 -07:00
Antoni Baum
5ed10ef921
[AIR/CI] Fix Hugging Face notebook example ( #26475 )
2022-07-13 09:16:42 -07:00
Eric Liang
4c04c8d92c
[doc] Rename toc entry for libraries back to "Ray Libraries" ( #26485 )
2022-07-12 14:23:36 -07:00
Richard Liaw
92efc85b3b
[air/docs] checkpoints ( #25901 )
2022-07-11 20:40:23 -07:00
Richard Liaw
1abe908c22
[air/docs] improve consistency of getting started ( #26247 )
2022-07-11 20:16:37 -07:00
Antoni Baum
65ea710e30
[Docs] Update Train user guide to use the new APIs ( #26091 )
2022-07-11 15:10:10 -07:00
Richard Liaw
5892a76a44
[air/tune] Documentation testing fixes ( #26409 )
2022-07-09 19:47:21 -07:00
Amog Kamsetty
cc43bcccb4
[AIR] Update TensorflowPredictor to new API ( #26215 )
...
Updates TensorflowPredictor to use the new _predict_pandas API.
Also as agreed upon offline, removes the extra configurations from TensorflowPredictor (column selection, concatenation) in favor of having this be done via a Preprocessor.
2022-07-08 13:04:49 -07:00