Commits


Kenta Murata authored and Wes McKinney committed b8aeb79e94a
ARROW-854: [Format] Add tentative SparseTensor format I'm interested in making a language-agnostic sparse tensor format. I believe one of the suitable places to do this is Apache Arrow, so let me propose my idea of this here. First of all, I found that there is no common memory layout of sparse tensor representations in my investigation. It means we need some kinds of conversion to share sparse tensors among different systems even if the data format is logically the same. It is the same situation as dataframe, and this is the reason why I believe Apache Arrow is the suitable place. There are many formats to represent a sparse tensor. Most of them are specialized for a matrix, which has two dimensions. There are few formats for general sparse tensor with more than two dimensions. I think the COO format is suitable to start because COO can handle any dimensions, and many systems support the COO format. In my investigation, the systems support COO are SciPy, dask, pydata/sparse, TensorFlow, and PyTorch. Additionally, CSR format for matrices may also be good to support at the first time. The reason is that CSR format is efficient to extract row slices, that may be important for extracting samples from tidy data, and it is supported by SciPy, MXNet, and R's Matrix library. I add my prototype definition of SparseTensor format in this pull-request. I designed this prototype format to be extensible so that we can support additional sparse formats. I think we at least need to support additional sparse tensor format for more than two dimensions in addition to COO so we will need this extensibility. Author: Kenta Murata <mrkn@mrkn.jp> Closes #2546 from mrkn/sparse_tensor_proposal and squashes the following commits: 148bff822 <Kenta Murata> make format d57e56fc6 <Kenta Murata> Merge sparse_tensor_format.h into sparse_tensor.h 880bbc4eb <Kenta Murata> Rename too-verbose function name c83ea6aaf <Kenta Murata> Add type aliases of sparse tensor types 90e8b3166 <Kenta Murata> Rename sparse tensor classes 07a651863 <Kenta Murata> Use substitution instead of constructor call 37a0a14c6 <Kenta Murata> Remove needless function declaration 97e85bd35 <Kenta Murata> Use std::make_shared 3dd434c83 <Kenta Murata> Capitalize member function name 6ef6ad065 <Kenta Murata> Apply code formatter 6f291581e <Kenta Murata> Mark APIs for sparse tensor as EXPERIMENTAL ff3ea71c5 <Kenta Murata> Rename length to non_zero_length in SparseTensor f78230344 <Kenta Murata> Return Status::IOError instead of DCHECK if message header type is not matched 7e814de36 <Kenta Murata> Put EXPERIMENTAL markn in comments 357860d8c <Kenta Murata> Fix typo in comments 43d8eea44 <Kenta Murata> Fix coding style 99b1d1d4d <Kenta Murata> Add missing ARROW_EXPORT specifiers 401ae8023 <Kenta Murata> Fix SparseCSRIndex::ToString and add tests 9e457acd3 <Kenta Murata> Remove needless virtual specifiers 3b1db7d32 <Kenta Murata> Add SparseTensorBase::Equals d6a8c3805 <Kenta Murata> Unify Tensor.fbs and SparseTensor.fbs b3a62ebfa <Kenta Murata> Fix format 6bc9e296f <Kenta Murata> Support IPC read and write of SparseTensor 1d9042709 <Kenta Murata> Fix format 51a83bfee <Kenta Murata> Add SparseTensorFormat 93c03adad <Kenta Murata> Add SparseIndex::ToString() 021b46be0 <Kenta Murata> Add SparseTensorBase ed3984dd4 <Kenta Murata> Add SparseIndex::format_type 4251b4d08 <Kenta Murata> Add SparseCSRIndex 433c9b441 <Kenta Murata> Change COO index matrix to column-major in a format description 392a25b7c <Kenta Murata> Implement SparseTensor and SparseCOOIndex b24f3c342 <Kenta Murata> Insert additional padding in sparse tensor format c508db086 <Kenta Murata> Write sparse tensor format in IPC.md 2b50040f5 <Kenta Murata> Add an example of the CSR format in comment 76c56dd35 <Kenta Murata> Make indptr of CSR a buffer d7e653f17 <Kenta Murata> Add an example of COO format in comment 866b2c13a <Kenta Murata> Add header comments in SparseTensor.fbs aa9b8a4d0 <Kenta Murata> Add SparseTensor.fbs in FBS_SRC 1f16ffed8 <Kenta Murata> Fix syntax error in SparseTensor.fbs c3bc6edfa <Kenta Murata> Add tentative SparseTensor format