Public / arrow / 2c5b412c286

Commits

Wes McKinney authored 2c5b412c28619 Jul 2017
ARROW-1167: [Python] Support chunking string columns in Table.from_pandas

This resolves the error with converting the dataset in ARROW-1167, which only takes up 4.5 GB in memory but has a single column with over 2GB in binary data.

The unit test for this is not run in CI because of large memory allocation, but can be run with

```
py.test pyarrow --large_memory
```

cc @jeffknupp

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #867 from wesm/ARROW-1167 and squashes the following commits:

dae62326 [Wes McKinney] cpplint
dcdec91a [Wes McKinney] Support ChunkedArray outputs of Array.from_pandas
150e9fc9 [Wes McKinney] Produced ChunkedArray when exceeding 2GB in a single BinaryArray column
707555f8 [Wes McKinney] Split up pandas_convert, make PandasObjectsToArrow return ChunkedArray to accommodate large string data