Commits


Antoine Pitrou authored and Wes McKinney committed 1f79fafaec0
ARROW-3407: [C++] Add UTF8 handling to CSV conversion CSV conversion now has distinct paths for string and binary columns. String columns are UTF8-validated by default, but it can be disabled by setting the `check_utf8` option in `ConvertOptions`. CSV type inference now first attempts string conversion and falls back on binary if UTF8 validation fails (if it's not disabled). As for performance, on pure ASCII columns single-threaded reading slows down by ~10% (which can be avoided by setting `check_utf8` to false). Multi-threaded reading does not seem affected here. Based on PR #2916. Author: Antoine Pitrou <antoine@python.org> Closes #2924 from pitrou/ARROW-3407-csv-utf8-conversion and squashes the following commits: 26a812c5c <Antoine Pitrou> ARROW-3407: Add UTF8 handling to CSV conversion