1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-17 16:46:39 -04:00
Commit graph

11 commits

Author SHA1 Message Date
Eric Liang
e15a419028
Enable stage fusion by default for dataset pipelines ()
This PR enables stage fusion for dataset pipelines. This also requires:
1. Removing the num_cpus=0.5 default for the read stage, to enable fusion of the read stage.
2. Removing spread_resource_prefix (not supported for now).
2022-02-23 17:34:05 -08:00
Jiajun Yao
baa14d695a
Round robin during spread scheduling ()
- Separate spread scheduling and default hydra scheduling (i.e. SpreadScheduling != HybridScheduling(threshold=0)): they are already separated in the API layer and they have the different end goals so it makes sense to separate their implementations and evolve them independently.
- Simple round robin for spread scheduling: this is just a starting implementation, can be optimized later.
- Prefer not to spill back tasks that are waiting for args since the pull is already in progress.
2022-02-18 15:05:35 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ()
See  and  for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Jiajun Yao
cea80b1a5b
Don't advertise cpus on gpu nodes for pipelined ingestion tests ()
* Don't advertise cpus on gpu nodes for pipelined ingestion tests

* Don't advertise cpus on gpu nodes for pipelined ingestion tests

* Don't advertise cpus on gpu nodes for pipelined ingestion tests
2022-01-27 09:17:01 -08:00
Antoni Baum
7ce22b72ed
[datasets] Expand to_torch's functionality ()
Expands the `to_torch` method for Datasets with:
* An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists)
* An ability to choose whether the label should be unsqueezed or not
* An ability to pass `None` as the label (for prediction).

Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists).

Unit tests for all cases are included.

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-01-03 09:03:50 -08:00
Jiajun Yao
9776e21842
Revert "Round robin during spread scheduling ()" ()
This reverts commit 60388b2834.
2021-12-30 10:33:06 +09:00
Jiajun Yao
60388b2834
Round robin during spread scheduling () 2021-12-22 20:27:34 -08:00
Amog Kamsetty
9796ae56d5
[Train][Data] Change usages of iter_datasets to iter_epochs () 2021-11-17 18:05:51 -08:00
Chen Shen
9dba5e0ead
[dataset][nightly-test] fix pipeline ingest test () 2021-10-18 11:31:24 +01:00
Eric Liang
86cbe3e833
[data] Add support for repeating and re-windowing a DatasetPipeline () 2021-10-06 20:13:43 -07:00
Chen Shen
7c99aae033
[dataset][nightly-test] add pipelined ingestion/training nightly test 2021-09-23 20:39:03 -07:00