Commits


Kazuaki Ishizaki authored and Antoine Pitrou committed 858f45e59c4
ARROW-8797: [C++] Read RecordBatch in a different endian This PR creates a test to receive RecordBatch for different endian (e.g. receive RecordBatch with big-endian schema on little-endian platform). This PR changes 1. Introduce Endianness enum class to represent endianness 2. Add a new flag `endianness` to `arrow::schema` to represent endianness of Array in `RecordBatch`. 3. Eagerly convert non-nativeendian data to nativeendian data in a batch if `IpcReadOption.use_native_endian = true` (`true` by default). 4. Add golden arrow files for integration test in both endians and test script Regarding 3., other possible choices are as follows: - Lazily convert non-nativeendian data to nativeendian data for a column in each RecordBatch. Pros: Avoid conversion for columns that will not be read Cons: Complex management of endianness of each column. Inconsistency of endianness between schema and column data. - Convert non-nativeendian data to nativeendian data when each element is read Pros: Can keep the original schema without batch conversion Cons: 1) Each RecordBatch may need an additional field to show whether the endian conversion is necessary or not. 2) Need to update test cases to accept different endianness between expected and actual schemas. Now, this PR uses the simplest approach (always see native endianness in schemas) that eagerly converts all of the columns in a batch. TODO - [x] Support to convert endian of each element for primitive types - [x] Support to convert endian of each element for complex types - [x] Support to convert endian of each element for all types for stream - [x] Add golden arrow files in both endians For creating this PR, @kou helps me greatly (prototype of arrow::schema and teaching me about RecordBatch). Closes #7507 from kiszk/ARROW-8797 Lead-authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>