Commits


Tao He authored and Antoine Pitrou committed 171c3f709bb
ARROW-6054: [Python] Fix the type erasion bug when serializing structured type ndarray. Fix the type erasion bug when serializing structured arrays of numpy. Without this patch, we could see something like: ```python In [1]: import pyarrow as pa In [2]: import numpy as np In [3]: x = np.array([(1, "a"), (2, "bb")], dtype=np.dtype([('x', 'int32'), ('y', '<U4')])) In [4]: y = pa.deserialize(pa.serialize(x).to_buffer()) In [5]: y Out[5]: array([b'\x01\x00\x00\x00\x61\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', b'\x02\x00\x00\x00\x62\x00\x00\x00\x62\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'], dtype='|V20') In [6]: x.dtype Out[6]: dtype([('x', '<i4'), ('y', '<U4')]) ``` Note that the dtype of deserialized result `y` is lost, which is really annoying. The reason is that `x.type.str` is `'|V20'`, rather than the structured type it self. Thus we need `dtype.descr`. After this PR, we get something like ```python In [1]: import pyarrow as pa In [2]: import numpy as np In [3]: x = np.array([(1, "a"), (2, "bb")], dtype=np.dtype([('x', 'int32'), ('y', '<U4')])) In [4]: y = pa.deserialize(pa.serialize(x).to_buffer()) In [5]: y Out[5]: array([(1, 'a'), (2, 'bb')], dtype=[('x', '<i4'), ('y', '<U4')]) In [6]: y.dtype Out[6]: dtype([('x', '<i4'), ('y', '<U4')]) ``` I didn't see any existing test that checks the `dtype` when testing serialization of `numpy` thus I didn't add test case for this PR. If a test case is needed please let me know. Closes #4953 from sighingnow/fix-structured-dtype-serialization and squashes the following commits: f1778f616 <Tao He> Fix the backwards compatiblity issue of `descr_to_dtype`. cfd67a6f9 <Tao He> Use `dtype_to_descr` and `dtype_to_descr` of numpy, and add tests. 7eea409cf <Tao He> Use `dtype.descr` rather than `str` to avoid type erasion. Authored-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Antoine Pitrou <antoine@python.org>