Commits

Wes McKinney authored 712b9d2c98d
ARROW-1784: [Python] Enable zero-copy serialization, deserialization of pandas.DataFrame via components This patch adds a serialization path for pandas.DataFrame (and Series) that decomposes the internal BlockManager into a dictionary structure that can be serialized to the zero-copy component representation from ARROW-1783, and then reconstructed similarly. The impact of this is that when a DataFrame has no data that requires pickling, the reconstruction is zero-copy. I will post some benchmarks to illustrate the impact of this. The performance improvements are pretty remarkable, nearly 1000x speedup on a large DataFrame. As some follow-up work, we will need to do more efficient serialization of the different pandas Index types. We should create a new JIRA for this Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1390 from wesm/ARROW-1784 and squashes the following commits: 21adbe7d [Wes McKinney] Do not test with IntervalIndex in pandas < 0.21, since manylinux1 is pinned at 0.20.1 939c02bb [Wes McKinney] Add pandas serialization test for periods, intervals 4b4c776c [Wes McKinney] Code comment, add more serialization docs for pandas / component serialization 1ac073c3 [Wes McKinney] Complete component-based serializer for pandas.DataFrame 6b01746d [Wes McKinney] Begin refactoring