Commits

Wes McKinney authored 25b4a46805a
ARROW-4324: [Python] Triage broken type inference logic in presence of a mix of NumPy dtype-having objects and other scalar values In investigating the innocuous bug report from ARROW-4324 I stumbled on a pile of hacks and flawed design around type inference ``` test_list = [np.dtype('int32').type(10), np.dtype('float32').type(0.5)] test_array = pa.array(test_list) # Expected # test_array # <pyarrow.lib.DoubleArray object at 0x7f009963bf48> # [ # 10, # 0.5 # ] # Got # test_array # <pyarrow.lib.Int32Array object at 0x7f009963bf48> # [ # 10, # 0 # ] ``` It turns out there are several issues: * There was a kludge around handling the `numpy.nan` value which is a PyFloat, not a NumPy float64 scalar * Type inference assumed "NaN is null", which should not be hard coded, so I added a flag to switch between pandas semantics and non-pandas * Mixing NumPy scalar values and non-NumPy scalars (like our evil friend numpy.nan) caused the output type to be simply incorrect. For example `[np.float16(1.5), 2.5]` would yield `pa.float16()` output type. Yuck In inserted some hacks to force what I believe to be the correct behavior and fixed a couple unit tests that actually exhibited buggy behavior before (see within). I don't have time to do the "right thing" right now which is to more or less rewrite the hot path of `arrow/python/inference.cc`, so at least this gets the unit tests asserting what is correct so that refactoring will be more productive later. Author: Wes McKinney <wesm+git@apache.org> Closes #4527 from wesm/ARROW-4324 and squashes the following commits: e396958b0 <Wes McKinney> Add unit test for passing pandas Series with from_pandas=False 754468a5d <Wes McKinney> Set from_pandas to None by default in pyarrow.array so that user wishes can be respected e1b839339 <Wes McKinney> Remove outdated unit test, add Python unit test that shows behavior from ARROW-2240 that's been changed 4bc8c8193 <Wes McKinney> Triage type inference logic in presence of a mix of NumPy dtype-having objects and other typed values, pending more serious refactor in ARROW-5564