Commits


Sarah Gilmore authored and GitHub committed d261a824f4e
GH-42146: [MATLAB] Add IPC `RecordBatchFileReader` and `RecordBatchFileWriter` MATLAB classes (#42201) ### Rationale for this change To enable initial IPC I/O support in the MATLAB interface, we should add a `RecordBatchFileReader` class and a `RecordBatchFileWriter` class. ### What changes are included in this PR? 1. Added a new `arrow.io.ipc.RecordBatchFileWriter` class. 2. Added a new `arrow.io.ipc.RecordBatchFileReader` class. **Example** ```matlab >> city = ["Boston" "Seattle" "Denver" "Juno" "Anchorage" "Chicago"]'; >> daylength = duration(["15:17:01" "15:59:16" "14:59:14" "19:21:23" "14:18:24" "15:13:39"])'; >> matlabTable = table(city, daylength, VariableNames=["City", "DayLength"]); >> recordBatch1 = arrow.recordBatch(matlabTable(1:4, :)) >> recordBatch2 = arrow.recordBatch(matlabTable(5:end, :)); >> writer = arrow.io.ipc.RecordBatchFileWriter("daylight.arrow", recordBatch1.Schema); >> writer.writeRecordBatch(recordBatch1); >> writer.writeRecordBatch(recordBatch2); >> writer.close(); >> reader = arrow.io.ipc.RecordBatchFileReader("daylight.arrow"); reader = RecordBatchFileReader with properties: NumRecordBatches: 2 Schema: [1×1 arrow.tabular.Schema] >> reader.Schema ans = Arrow Schema with 2 fields: City: String | DayLength: Time64 >> rb1 = reader.read(1); >> isequal(rb1, recordBatch1) ans = logical 1 >> rb2 = reader.read(2); >> isequal(rb2, recordBatch2) ans = logical 1 ``` ### Are these changes tested? Yes. Added two new test files: 1. `arrow/matlab/test/io/ipc/tRecordBatchFileWriter.m` 2. `arrow/matlab/test/io/ipc/tRecordBatchFileReader.m` ### Are there any user-facing changes? Yes. Users can now serialize `RecordBatch`es and `Table`s to files using the Arrow IPC data format as well as read in `RecordBatch`es from Arrow IPC data files. ### Future Directions 1. Add `RecordBatchStreamWriter` and `RecordBatchStreamReader` 2. Expose options for [controlling](https://github.com/apache/arrow/blob/main/cpp/src/arrow/ipc/options.h) IPC reading and writing in MATLAB. 3. Add more methods to `RecordBatchFileReader` to read in multiple record batches at once as well as importing the data as an Arrow `Table`. * GitHub Issue: #42146 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Sarah Gilmore <sgilmore@mathworks.com>