Commits


Kevin Gurney authored and Wes McKinney committed f5045c9d9d3
ARROW-3897: [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file **Writing Feather Files** - Currently the MATLAB interface to Feather supports reading numeric datatypes (`double`, `single`, `uint*` and `int*`) from Feather files using the `featherread` function. - This pull request adds a `featherwrite` function to serialize a MATLAB `table` containing numeric datatypes to a Feather file. **Validity (Null) Bitmap Support** - Previously, there was a `TODO` to add support for interpreting validity (null) bitmaps when reading from a Feather file. - This pull request adds support for reading missing values from Feather files. Because there is no built in representation for missing values of integer type in MATLAB, `featherread` will cast integer columns with missing values to `double` and set missing entries to `NaN`. **Testing** - Removed binary Feather files used for testing and replaced them with temporary files generated during test execution by `featherwrite`. - Added additional test cases to improve code coverage. **Notes** - A preliminary code review was performed in https://github.com/mathworks/arrow/pull/6. I saw the discussion on https://github.com/apache/arrow/pull/4321 about conducting code reviews directly in apache/arrow following https://apache.org/theapacheway/. I'll be sure to do code reviews directly in apache/arrow moving forward. - This pull request contains a lot of changes. My apologies for this. Our intent was to refactor the reading and writing code and improve code coverage, but we should have isolated these changes into separate patches/JIRA issues. We'll be more thoughtful about breaking our submissions into small, single purpose patches in the future. - This pull request includes contributions from @rdmello and @sihuiliu. Thanks! Author: Kevin Gurney <kevin.p.gurney@gmail.com> Author: rdmello <rylan.dmello@mathworks.com> Author: Sihui <sihui.liu@mathworks.com> Closes #4328 from kevingurney/ARROW-3897 and squashes the following commits: 5e5686537 <rdmello> Use mxLogical instead of uint8_t in bit-packing/unpacking operations. ef3288da8 <rdmello> Use uint8_t* instead of bool* in bit-packing and unpacking code. 35143c22c <rdmello> Use BitmapReader in VisitBits to improve performance. baaaf5291 <rdmello> Avoid MSVC build failures by removing implicit conversions of std::array<T,N>::iterator to uint8_t* d029ac802 <rdmello> Resolve formatting issue in bit-util-test.cc. d57c338f9 <Kevin Gurney> Remove use of const qualifiers with primitive type function arguments. a796bbfa9 <Kevin Gurney> Remove unnecessary explicit use of arrow::. 74668da4a <Kevin Gurney> Replace mlarrow namespace with arrow::matlab namespace. ff8d220e6 <rdmello> Use AllocateResizableBuffer in order to handle allocation issues better 2e4f7ddb7 <Kevin Gurney> Remove use of Feather file version information in Feather reading and writing code. 6d311bcf6 <rdmello> Use GenerateBitsUnrolled for better performance when bit-packing validity bitmap 0105f69f0 <rdmello> Address lint errors and code review feedback. 16ead0a5c <rdmello> Add VisitBits and VisitBitsUnrolled to read from a bitmap. ce8628b62 <Kevin Gurney> Remove unnecessary use of arrow::Status lvalue in FeatherWriter::WriteVariables. Remove unnecessary use of ARROW_RETURN_NOT_OK in FeatherWriter::Open. 39d4707c9 <Kevin Gurney> Remove unnecessary arrow::Array lvalue in WriteNumericData. 0f11c9790 <Kevin Gurney> Use std::make_shared in WriteNumericData. 684f283b6 <Kevin Gurney> Return nullptr from WriteVariableData. 3d6cb2c44 <Kevin Gurney> Change FeatherWriter::WriteMetadata return type to void. 63757e491 <Kevin Gurney> Use RETURN_NOT_OK in FeatherReader::Open. a027dd616 <Kevin Gurney> Refactor featherwrite MATLAB table conversion code into utility function mlarrow.util.table2mlarrow. 845394a85 <Kevin Gurney> Refactor invalid MATLAB table variable name handling code into utility mlarrow.util.makeValidMATLABTableVariableNames. Add test case for handling invalid MATLAB table variable names. 53c55fb35 <rdmello> Adding unicode conversion utility for column names and refactoring bit-packing/unpacking code ce3faad5f <Kevin Gurney> Move createMetadataStruct and createVariableStruct into mlarrow.util package for use in featherwrite code. 2d6a1f275 <Kevin Gurney> Factor out negation used in unpacked validity bitmap array check. b35303263 <Kevin Gurney> Break up long line in NumericDatatypesWithNaNRow test case into multiple lines for improved readability. 0454b13ab <Kevin Gurney> Replace all uses of typedef with using. 80cbce34f <Kevin Gurney> Clean up and refactor tfeather.m. Add tfeathermex.m numeric nulls round trip test. Create test utilities directory. bed64e15c <Kevin Gurney> Modify tfeather.m to make all tests fail if MEX files are not on the MATLAB path 3cdb22197 <rdmello> Fixing FEATHERWRITE test failure for 0-by-n inputs. 4eeb11a53 <Sihui> Added featherwrite tests, removed binary files and modified old tests to roundtrip. 55ac75260 <rdmello> Adding support for writing the nulls (validity) bitmap for numeric types to a Feather file from MATLAB 5ef69316a <Kevin Gurney> Implement validity (null) bitmap support for featherread numeric types e92a21159 <Kevin Gurney> Update README.md to include example code for writing a MATLAB table to a Feather file 268222027 <Kevin Gurney> Implement featherwrite MATLAB code for numeric datatypes support c86a1d36d <rdmello> Add support for writing numeric datatypes to Feather files.