Commits

Wes McKinney authored ccbf6446bcc
ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not originating in pandas This unifies the ingest path for 1D data into `pyarrow.array`. I added the argument `from_pandas` to turn null sentinel checking on or off: ``` In [8]: arr = np.random.randn(10000000) In [9]: arr[::3] = np.nan In [10]: arr2 = pa.array(arr) In [11]: arr2.null_count Out[11]: 0 In [12]: %timeit arr2 = pa.array(arr) The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 68.4 µs per loop In [13]: arr2 = pa.array(arr, from_pandas=True) In [14]: arr2.null_count Out[14]: 3333334 In [15]: %timeit arr2 = pa.array(arr, from_pandas=True) 1 loop, best of 3: 228 ms per loop ``` When the data is contiguous, it is always zero-copy, but then `from_pandas=True` and no null mask is passed, then a null bitmap is constructed and populated. This also permits sequence reads into integers smaller than int64: ``` In [17]: pa.array([1, 2, 3, 4], type='i1') Out[17]: <pyarrow.lib.Int8Array object at 0x7ffa1c1c65e8> [ 1, 2, 3, 4 ] ``` Oh, I also added NumPy-like string type aliases: ``` In [18]: pa.int32() == 'i4' Out[18]: True ``` Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1146 from wesm/expand-py-array-method and squashes the following commits: 1570e525 [Wes McKinney] Code review comments d3bbb3c3 [Wes McKinney] Handle type aliases in cast, too 797f0151 [Wes McKinney] Allow null checking to be skipped with from_pandas=False in pyarrow.array f2802fc7 [Wes McKinney] Cleaner codepath for numpy->arrow conversions 587c575a [Wes McKinney] Add direct types sequence converters for more data types cf40b767 [Wes McKinney] Add type aliases, some unit tests 7b530e4b [Wes McKinney] Consolidate both sequence and ndarray/Series/Index conversion in pyarrow.Array