Commits

Wes McKinney authored 61a54f8a619
ARROW-509: [Python] Add support for multithreaded Parquet reads I'm getting very nice speedups on a Parquet file storing a ~4.5 GB dataset: ``` In [1]: import pyarrow.parquet as pq In [2]: %time table = pq.read_table('/home/wesm/data/airlines_parquet/4345e5eef217aa1b-c8f16177f35fd983_1150363067_data.0.parq') CPU times: user 8.21 s, sys: 468 ms, total: 8.68 s Wall time: 8.68 s In [3]: %time table = pq.read_table('/home/wesm/data/airlines_parquet/4345e5eef217aa1b-c8f16177f35fd983_1150363067_data.0.parq', nthreads=4) CPU times: user 8.84 s, sys: 4.28 s, total: 13.1 s Wall time: 3.91 s In [4]: %time table = pq.read_table('/home/wesm/data/airlines_parquet/4345e5eef217aa1b-c8f16177f35fd983_1150363067_data.0.parq', nthreads=8) CPU times: user 13.3 s, sys: 1.15 s, total: 14.4 s Wall time: 2.86 s ``` This requires a bugfix in parquet-cpp that will come soon in a patch for PARQUET-836 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #301 from wesm/ARROW-509 and squashes the following commits: 9816689 [Wes McKinney] Update docs slightly, flake8 warning 239b086 [Wes McKinney] Add support for nthreads option in parquet::arrow, unit tests