Commits

Wes McKinney authored 208e79812b5
ARROW-1594: [Python] Multithreaded conversions to Arrow in from_pandas This results in nice speedups when column conversions do not require GIL to be held: ```python In [5]: import numpy as np In [6]: import pandas as pd In [7]: import pyarrow as pa In [8]: NROWS = 1000000 In [9]: NCOLS = 50 In [10]: arr = np.random.randn(NCOLS, NROWS).T In [11]: arr[::5] = np.nan In [12]: df = pd.DataFrame(arr) In [13]: %timeit rb = pa.RecordBatch.from_pandas(df, nthreads=1) 10 loops, best of 3: 179 ms per loop In [14]: %timeit rb = pa.RecordBatch.from_pandas(df, nthreads=4) 10 loops, best of 3: 59.7 ms per loop ``` This introduces a dependency on the `futures` Python 2.7 backport of concurrent.futures (PSF license) Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1186 from wesm/multithreaded-from-pandas and squashes the following commits: a3072f0e [Wes McKinney] Only install futures on py2 c30e4735 [Wes McKinney] Add heuristic to use threadpool conversion only if nrows > ncols * 100 5a692085 [Wes McKinney] Only install concurrent.futures backport on py2, test serialize_pandas with nthreads 0afab342 [Wes McKinney] Add nthreads argument to serialize_pandas, make default for serialize/deserialize consistent 15841d13 [Wes McKinney] Default to cpu_count() for nthreads in from_pandas to conform with to_pandas default 6a58c038 [Wes McKinney] Add nthreads argument to RecordBatch/Table.from_pandas. Use concurrent.futures for parallel processing