mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
[Datasets] Add example of using map_batches
to filter (#24202)
The documentation says > Consider using .map_batches() for better performance (you can implement filter by dropping records). but there aren't any examples of how to do so.
This commit is contained in:
parent
0b6505e8c6
commit
ebe2929d4c
1 changed files with 10 additions and 0 deletions
|
@ -250,6 +250,16 @@ class Dataset(Generic[T]):
|
|||
... compute=ActorPoolStrategy(2, 8), # doctest: +SKIP
|
||||
... num_gpus=1) # doctest: +SKIP
|
||||
|
||||
You can use ``map_batches`` to efficiently filter records.
|
||||
|
||||
>>> import ray
|
||||
>>> ds = ray.data.range(10000) # doctest: +SKIP
|
||||
>>> ds.count() # doctest: +SKIP
|
||||
10000
|
||||
>>> ds = ds.map_batches(lambda batch: [x for x in batch if x % 2 == 0]) # doctest: +SKIP # noqa: #501
|
||||
>>> ds.count() # doctest: +SKIP
|
||||
5000
|
||||
|
||||
Time complexity: O(dataset size / parallelism)
|
||||
|
||||
Args:
|
||||
|
|
Loading…
Add table
Reference in a new issue