Commits


Will Jones authored and Jonathan Keane committed 62db4b6a254
ARROW-14644: [C++][R] open_dataset doesn't ignore BOM in csv file `arrow::csv::StreamingReader` already has handling for byte order marks (BOM). However, #7896 introduced `arrow::dataset::GetColumnNames` which is called prior to instantiating the reader and was missing BOM handling. This PR adds BOM handling to that method. Without BOM handling, the first column as parsed by `arrow::dataset::GetColumnNames` contained the BOM (e.g. was `"<BOM>a"` instead of `"a"`). Because of this, it failed the test on line 120 below and was not added to `convert_options.include_columns`. https://github.com/apache/arrow/blob/9cf4275a19c994879172e5d3b03ade9a96a10721/cpp/src/arrow/dataset/file_csv.cc#L117-L122 Closes #11892 from wjones127/ARROW-14644_skip_BOM_in_CSV_file_with_open_dataset Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Dragos Moldovan-Grünfeld <dragos.mold@gmail.com> Signed-off-by: Jonathan Keane <jkeane@gmail.com>