Commits

Wes McKinney authored 8452071180c
ARROW-1199: [C++] Implement mutable POD struct for Array data This data structure provides a new internal data structure that is a self-contained representation of the memory and metadata inside an Arrow array data structure. This class is designed for easy internal data manipulation, analytical data processing, and data transport to and from IPC messages. For example, we could cast from int64 to float64 like so: ```c++ Int64Array arr = GetMyData(); std::shared_ptr<internal::ArrayData> new_data = arr->data()->ShallowCopy(); new_data->type = arrow::float64(); Float64Array double_arr(new_data); ``` This object is also useful in an analytics setting where memory may be reused. For example, if we had a group of operations all returning doubles, say: ``` Log(Sqrt(Expr(arr)) ``` Then the low-level implementations of each of these functions could have the signatures void Log(const ArrayData& values, ArrayData* out); As another example a function may consume one or more memory buffers in an input array and replace them with newly-allocated data, changing the output data type as well. I did quite a bit of refactoring and code simplification that was enabled by this patch. I note that performance in IPC loading of very wide record batches is about 15% slower, but in smaller record batches it is about the same in microbenchmarks. This code path could possibly be made faster with some performance analysis work. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #824 from wesm/array-data-internals and squashes the following commits: f1acbae1 [Wes McKinney] MSVC fixes dcdf2b29 [Wes McKinney] Fix glib per C++ API changes d0a8ee2b [Wes McKinney] Fix logic error in UnsafeSetNotNull d17f886c [Wes McKinney] Construct dictionary indices in ctor bba42530 [Wes McKinney] Set correct type when creating BinaryArray ba3b2992 [Wes McKinney] Various fixes, Python fixes, add Array operator<< to std::ostream for debugging 0b8af24a [Wes McKinney] Write field metadata directly into output object 05058638 [Wes McKinney] Fix up cmake 75bc6b4f [Wes McKinney] Delete cruft from array/loader.h and consolidate in arrow/ipc 24df1b97 [Wes McKinney] Review comments, add some doxygen comments 6e2e5720 [Wes McKinney] Preallocate vector of shared_ptr 05b806b2 [Wes McKinney] Tests passing again 5bdd6a99 [Wes McKinney] bug fixes 7894496e [Wes McKinney] Some fixes bf91a75a [Wes McKinney] Refactor to use shared_ptr, not yet working 130f0c1a [Wes McKinney] Use std::move instead of std::forward a9b4031b [Wes McKinney] Add move constructors to reduce unnecessary copying 475a3db6 [Wes McKinney] Bug fixes, test suite passing again 16918279 [Wes McKinney] Array internals refactoring to use POD struct for all buffers, auxiliary metadata