Commits


Joris Van den Bossche authored and GitHub committed 5181c24f6d7
GH-43683: [Python] Use pandas StringDtype when enabled (pandas 3+) (#44195) ### Rationale for this change With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype). This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled. ### What changes are included in this PR? - If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view) ### Are these changes tested? It is tested in the pandas-nightly crossbow build. There is still one failure that is because of a bug on the pandas side (https://github.com/pandas-dev/pandas/issues/59879) ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side. * GitHub Issue: #43683 Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>