Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
This PR implements batch filtering and resampling. I'm making this PR because I have been using these features in my own work for some time and thought they would be useful to others. Also, filtering has been requested in #158 and #162.
In both operations, the user provides a function that either accepts/rejects a batch (for filtering) or assign a sample weight for each batch (for resampling). Both functions take the dataset and the dict of slice objects, so the user can write those functions strategically to minimize computation on dask arrays. The changes are all in
BatchGenerator
since it seemed likeBatchSchema
is primarily intended as a representation of windowing parameters.I was not able to get the
asv
tests to work on my development environment, but there is no change to the original behavior ifresample_fn
andfilter_fn
are not provided so I do not expect there to be a performance penalty. Filtering and resampling happen independently, but you could approximate doing both in one shot by havingresample_fn
return 0 for invalid batches. That would be a little faster than "checking" each batch twice in two separate functions.