Commits


Kevin Gurney authored and GitHub committed 382230dd8a6
GH-36072: [MATLAB] Add MATLAB `arrow.tabular.RecordBatch` class (#36190) ### Rationale for this change Now that the MATLAB interface supports some basic `arrow.array.Array` types, it would be helpful to start building out the tabular types (e.g. `RecordBatch` and `Table`) in parallel. This pull request contains a basic implementation of `arrow.tabular.RecordBatch` (name subject to change). ### What changes are included in this PR? 1. Added new `arrow.tabular.RecordBatch` class that can be constructed from a MATLAB `table`. 2. Added new test class `tRecordBatch`. ### Are these changes tested? Yes. 1. Added new test class `tRecordBatch` containing basic tests for the `arrow.tabular.RecordBatch` class. ### Are there any user-facing changes? Yes. 1. Added new class `arrow.tabular.RecordBatch`. **Example**: ```matlab >> matlabTable = table(uint64([1,2,3]'), [true false true]', [0.1, 0.2, 0.3]', VariableNames=["UInt64", "Boolean", "Float64"]) matlabTable = 3x3 table UInt64 Boolean Float64 ______ _______ _______ 1 true 0.1 2 false 0.2 3 true 0.3 >> arrowRecordBatch = arrow.tabular.RecordBatch(matlabTable) arrowRecordBatch = UInt64: [ 1, 2, 3 ] Boolean: [ true, false, true ] Float64: [ 0.1, 0.2, 0.3 ] >> convertedMatlabTable = table(arrowRecordBatch) convertedMatlabTable = 3x3 table UInt64 Boolean Float64 ______ _______ _______ 1 true 0.1 2 false 0.2 3 true 0.3 >> isequal(matlabTable, convertedMatlabTable) ans = logical 1 ``` 2. Added properties `NumColumns` and `ColumnNames` to `arrow.tabular.RecordBatch`: **Example**: ```matlab >> arrowRecordBatch.NumColumns ans = int32 3 >> arrowRecordBatch.ColumnNames ans = 1x3 string array "UInt64" "Boolean" "Float64" ``` 3. Added `column(i)` method to `arrow.tabular.RecordBatch` to retrieve the `i`th column of a `RecordBatch` as an `arrow.array.Array`. **Example**: ```matlab >> arrowUInt64Array = arrowRecordBatch.column(1) arrowUInt64Array = [ 1, 2, 3 ] >> class(arrowUInt64Array) ans = 'arrow.array.UInt64Array' >> arrowBooleanArray = arrowRecordBatch.column(2) arrowBooleanArray = [ true, false, true ] >> class(arrowBooleanArray) ans = 'arrow.array.UInt64Array' >> arrowFloat64Array = arrowRecordBatch.column(3) arrowFloat64Array = [ 0.1, 0.2, 0.3 ] >> class(arrowFloat64Array) ans = 'arrow.array.Float64Array' ``` 4. Added `toMATLAB` and `table` conversion methods to convert from a `RecordBatch` to a MATLAB `table`. ### Future Directions 1. Implement C++ logic for `toMATLAB` when the Arrow memory for a `RecordBatch` did originate from a MATLAB array (e.g. read from a Parquet file or somewhere else). 2. Add more supported construction interfaces (e.g. `arrow.tabular.RecordBatch(array1, ..., arrayN)`, arrow.tabular.RecordBatch.fromArrays(arrays)`, etc.). 3. Create an `arrow.tabular.Schema` class. Expose this as a public property on the `RecordBatch` class. Create related `arrow.type.Field` and `arrow.type.Type` classes. 4. Create an `arrow.tabular.Table` and related `arrow.array.ChunkedArray` class. 5. Add more `arrow.array.Array` types (e.g. `StringArray`, `TimestampArray`, `Time64Array`). 6. Create a basic workflow example of serializing a `RecordBatch` to disk using an I/O function (e.g. Parquet writing). ### Notes 1. Thanks @ sgilmore10 for your help with this pull request! 2. While writing the tests for `RecordBatch`, we stumbled upon a set of [accidentally committed diff markers] in `UInt64Array.m` or `tUInt64Array.m`. We removed these diff markers in this PR to unblock the `RecordBatch` tests. The unfortunate thing is that this wasn't caught before because MATLAB was simply ignoring the test file `tUInt64Array.m` because it had a syntax error in it. We could choose to explicitly list out all test files in the MATLAB CI workflows to try and avoid similar situations in the future, but this might get unwieldy to maintain over time as we add more tests. We are happy to hear any suggestions from other community members related to this topic. * Closes: #36072 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Kevin Gurney <kevin.p.gurney@gmail.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>