Commits

Wes McKinney authored 7aefa50a440
ARROW-3325: [Python][Parquet] Add "read_dictionary" argument to parquet.read_table, ParquetDataset to enable direct-to-DictionaryArray reads I also added support to `pyarrow.table` to invoke `Table.from_arrays` if a list or tuple of arrays is passed. This makes for more natural code IMHO. Using this option with heavily compressed data results in far less memory use and much better performance. See example benchmarks https://gist.github.com/wesm/450d85e52844aee685c0680111cbb1d7 Closes #4999 from wesm/ARROW-3325 and squashes the following commits: 2ca388149 <Wes McKinney> Improve docstring for read_dictionary parameter, add to ParquetDataset ee73d7b41 <Wes McKinney> Add missing PARQUET_EXPORT 0f450d53e <Wes McKinney> Clean up FileReaderBuilder. Add simle Python docs 8e2b70b1a <Wes McKinney> Expand read_dictionary with ParquetDataset test for multiple files 7237e6958 <Wes McKinney> Fix C++ and Python unit tests 9d503516f <Wes McKinney> Read Parquet fields directly as DictionaryArray in parquet.read_table and ParquetDataset 85f9b7206 <Wes McKinney> Initial threading of read_dictionary parameter, not terribly satisfying Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>