Commits

Antoine Pitrou authored 6cdb80c6cfd
ARROW-14940: [C++] Speed up CSV parser with long CSV cells Some CSV files may have long cells (values), for example if containing arbitrary texts or even things like timestamps. We can speed up parsing such CSV files by filtering multiple bytes at once for state-changing characters such as delimiters. This PR adds two kinds of bulk filters: - a very simple heuristic Bloom filter - a precise filter using SSE4.2 packed compare Given that negative filter matches have a non-trivial cost, the bulk filters are enabled only if the average cell length exceeds a given threshold. Closes #11828 from pitrou/ARROW-14940-csv-bulk-filter Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>