Commits


Wes McKinney authored and Philipp Moritz committed f4500255220
ARROW-1381: [Python] Use FixedSizeBufferWriter in SerializedPyObject.to_buffer With this setup: ``` import numpy as np import pyarrow as pa objects = [np.random.randn(500, 500) for i in range(400)] serialized = pa.serialize(objects) ``` I have before: ``` In [3]: %timeit buf = serialized.to_buffer() 201 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` and after: ``` In [4]: %timeit buf = serialized.to_buffer() 81.1 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` I added an `nthreads` option but note that when the objects are small, multithreading makes things slower due to the overhead of launching threads. I think the 1MB threshold in `arrow/io/memory.cc` may be too small, we might do some benchmarking to find a better default crossover point for switching between parallel and serial memcpy: ``` In [2]: %timeit buf = serialized.to_buffer(nthreads=4) 134 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` cc @pcmoritz @robertnishihara Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1017 from wesm/ARROW-1381 and squashes the following commits: fbd0028 [Wes McKinney] Add unit test for SerializedPyObject.to_buffer ab85230 [Wes McKinney] Add nthreads option for turning on multithreaded memcpy db12072 [Wes McKinney] Use FixedSizeBufferWriter in SerializedPyObject.to_buffer