Public / arrow / 7f4165c4757

Commits

Joris Van den Bossche authored and Wes McKinney committed 7f4165c475705 Nov 2019
ARROW-2428: [Python] Support pandas ExtensionArray in Table.to_pandas conversion

Prototype for https://issues.apache.org/jira/browse/ARROW-2428

What does this PR do?

- Based on the pandas_metadata (stored when creating a Table from a pandas DataFrame), we infer which columns originally had a pandas extension dtype, and support a custom conversion (based on a `__from_arrow__` method defined on the pandas extension dtype)
- The user can also specify explicitly with the `extension_column` keyword which columns should be converted to an extension dtype

This only covers [use case 1 discussed in the issue](https://issues.apache.org/jira/browse/ARROW-2428?focusedCommentId=16914231&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16914231): automatic roundtrip for pandas DataFrames that have extension dtypes.
So it eg does not yet provide a way to do this if the arrow.Table has no pandas metadata (did not originate from a pandas DataFrame)

Closes #5512 from jorisvandenbossche/ARROW-2428-arrow-pandas-conversion and squashes the following commits:

dc8abac17 <Joris Van den Bossche> Avoid pandas_dtype check for known numpy dtypes
9572641a5 <Joris Van den Bossche> clean-up, remove extension_column kwarg in to_pandas, add docs
6f6b6f6f7 <Joris Van den Bossche> Also support arrow ExtensionTypes via to_pandas_dtype (without having pandas metadata)
e2b4b6257 <Joris Van den Bossche> ARROW-2428:  Support pandas ExtensionArray in Table.to_pandas conversion

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>