Commits


Joris Van den Bossche authored and Neal Richardson committed 61c6b954453
ARROW-7569: [Python] Add API to map Arrow types to pandas ExtensionDtypes in to_pandas conversions See https://issues.apache.org/jira/browse/ARROW-7569 and https://issues.apache.org/jira/browse/ARROW-2428 for context. https://github.com/apache/arrow/pull/5512 only covered the first 2 cases described in ARROW-2428, this also tries to cover the third case. This PR adds a `types_mapping` to `Table.to_pandas` to specify pandas ExtensionDtypes for built-in arrow types to use in the conversion. One specific example use case for this ability is to convert arrow integer types to pandas' nullable integer dtype instead of to numpy integer dtype (or for one of the other custom nullable dtypes in pandas). For example: ``` table.to_pandas(types_mapping={pa.int64(): pd.Int64Dtype()}) ``` will avoid to convert the int columns first to numpy dtype (possibly float) by directly constructing the pandas nullable dtype. Need to add more tests, and one important concern is that using a pyarrow type instance as the dict key might not easily work for parametrized types (eg timestamp with resolution / timezone). Closes #6189 from jorisvandenbossche/ARROW-7569-to-pandas-types-mapping and squashes the following commits: cb82f5c21 <Joris Van den Bossche> expand tests 1d9c37ca1 <Joris Van den Bossche> simplify (remove unused extension_columns arg) b61b1f5ac <Joris Van den Bossche> dict -> function f3464b15a <Joris Van den Bossche> ARROW-7569: Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>