Public / arrow / ccbf6446bcc

Commits

Wes McKinney authored ccbf6446bcc30 Sep 2017
ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not originating in pandas

This unifies the ingest path for 1D data into `pyarrow.array`. I added the argument `from_pandas` to turn null sentinel checking on or off:

```
In [8]: arr = np.random.randn(10000000)

In [9]: arr[::3] = np.nan

In [10]: arr2 = pa.array(arr)

In [11]: arr2.null_count
Out[11]: 0

In [12]: %timeit arr2 = pa.array(arr)
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 68.4 µs per loop

In [13]: arr2 = pa.array(arr, from_pandas=True)

In [14]: arr2.null_count
Out[14]: 3333334

In [15]: %timeit arr2 = pa.array(arr, from_pandas=True)
1 loop, best of 3: 228 ms per loop
```

When the data is contiguous, it is always zero-copy, but then `from_pandas=True` and no null mask is passed, then a null bitmap is constructed and populated.

This also permits sequence reads into integers smaller than int64:

```
In [17]: pa.array([1, 2, 3, 4], type='i1')
Out[17]:
<pyarrow.lib.Int8Array object at 0x7ffa1c1c65e8>
[
  1,
  2,
  3,
  4
]
```

Oh, I also added NumPy-like string type aliases:

```
In [18]: pa.int32() == 'i4'
Out[18]: True
```

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #1146 from wesm/expand-py-array-method and squashes the following commits:

1570e525 [Wes McKinney] Code review comments
d3bbb3c3 [Wes McKinney] Handle type aliases in cast, too
797f0151 [Wes McKinney] Allow null checking to be skipped with from_pandas=False in pyarrow.array
f2802fc7 [Wes McKinney] Cleaner codepath for numpy->arrow conversions
587c575a [Wes McKinney] Add direct types sequence converters for more data types
cf40b767 [Wes McKinney] Add type aliases, some unit tests
7b530e4b [Wes McKinney] Consolidate both sequence and ndarray/Series/Index conversion in pyarrow.Array