Commits

Wes McKinney authored ff2ee42092c
PARQUET-1422: [C++] Use common Arrow IO interfaces throughout codebase This is a long overdue unification of platform code that wasn't possible until after the monorepo merge that occurred last year. This should also permit us to take a more consistent approach with regards to asynchronous IO. A backwards compatibility layer is provided for the now deprecated `parquet::RandomAccessSource` and `parquet::OutputStream` classes. Some incidental changes were required to get things to work: * ARROW-5428: Adding a "read extent" option to BufferedInputStream to limit the extent of bytes read from the underlying raw stream * `arrow::io::InputStream::Peek` needed to have its API changed to return Status, because of the next point * `arrow::io::BufferedOutputStream::Peek` will expand the buffer if a Peek is requested that is larger than the buffer. The idea is that it should be possible to "look ahead" in the stream without altering the stream position. This is needed as part of finding the next data header (which can be large or small depending on statistics size, etc.) in a Parquet stream * Added a `[]` operator to `Buffer` to facilitate testing * Some continued "flattening" of the "parquet/util" directory to be simpler Some outstanding questions: * The Parquet reader and writer classes assumed exclusive ownership of the file handles, and they are closed when the Parquet file is closed. Arrow files are shared, and so calling `Close` is not appropriate. I've attempted to preserve this logic by having Close called in the destructors of the wrapper classes in `parquet/deprecated_io.h` An issue I ran into * Changes in https://github.com/apache/arrow/commit/d82ac407fab1d4b28669b8f7a940f88d39dfd874 introduced a unit test with meaningful trailing whitespace, which my editor strips away. I've commented out the offending test and will have to open a JIRA about fixing Author: Wes McKinney <wesm+git@apache.org> Closes #4404 from wesm/parquet-use-arrow-io and squashes the following commits: f010a8ec5 <Wes McKinney> Add missing PARQUET_EXPORT macros 50f7b921d <Wes McKinney> Add missing PARQUET_EXPORT 3b27ac262 <Wes McKinney> Follow changes in c_glib, fix Doxygen warning 7c1ae55c3 <Wes McKinney> ReadableFile::Peek now returns NotImplemented cc7789e8f <Wes McKinney> Fix unit tests b6e173922 <Wes McKinney> Allow unbounded peeks in BufferedInputStream cd2a3cd70 <Wes McKinney> Add unit tests for legacy Parquet input/output wrappers e03f07d65 <Wes McKinney> remove outdated comment 4c40bf2e1 <Wes McKinney> Adapt Python bindings 769974a6e <Wes McKinney> Tests passing again 1886de841 <Wes McKinney> column_writer more similar to before 7efc1aca6 <Wes McKinney> Fix one bug 30f1f4d62 <Wes McKinney> Get things compiling again, but tests are broken 4efb4e707 <Wes McKinney> Implement expanding-peek logic, change signature of InputStream::Peak to be able to return Status db1877e8c <Wes McKinney> More progress toward compilation, port over parquet::BufferedInputStream unit tests b05a71213 <Wes McKinney> More refactoring 66be1af04 <Wes McKinney> Port more code, add basic wrapper implementation for legacy IO interfaces 59143ddec <Wes McKinney> Start a bit of refactoring/consolidation in prep for using Arrow IO interfaces