Commits


Alessandro Molina authored and GitHub committed f67009aac21
ARROW-16616: [Python] Add lazy Dataset.filter() method (#13409) Expose a `Dataset.filter` method that applies a filter to the dataset without actually loading it in memory. Addresses what was discussed in https://github.com/apache/arrow/pull/13155#discussion_r875076518 - [x] Update documentation - [x] Ensure the filtered dataset preserves the filter when writing it back - [x] Ensure the filtered dataset preserves the filter when joining - [x] Ensure the filtered dataset preserves the filter when applying standard `Dataset.something` methods. - [x] Allow to extend the filter by adding more conditions subsequently `dataset(filter=X).filter(filter=Y).scanner(filter=Z)` (related to https://github.com/apache/arrow/pull/13409#discussion_r914281876) - [x] Refactor to use only `Dataset` class instead of `FilteredDataset` as discussed with @ jorisvandenbossche - [x] Add support in replace_schema - [x] Error in get_fragments in case a filter is set. - [x] Verify support in UnionDataset Lead-authored-by: Alessandro Molina <amol@turbogears.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Alessandro Molina <amol@turbogears.org>