Commits


Benjamin Kietzman authored and Wes McKinney committed 9c19bb65c12
ARROW-694: [C++] Initial parser interface for reading JSON into RecordBatches ( abandoning https://github.com/apache/arrow/pull/3206 ) Adds [`json` sub project](https://github.com/apache/arrow/pull/3592/files#diff-2443c7d7b39b992ea580f0fbd387284a) with: - BlockParser which parses Buffers of json formatted data into a StructArray with minimal conversion * true/false, and null fields are stored in BooleanArray and NullArray respectively * strings are stored as indices into a single StringArray * numbers are not converted; their string representations are stored alongside string values * nested fields are stored as ListArray or StructArray of their parsed (unconverted) children - Three approaches to handling unexpected fields: 1. Error on an unexpected field 2. Ignore unexpected fields 3. Infer the type of unexpected fields and add them to the schema - [Convenience interface](https://github.com/apache/arrow/pull/3592/files#diff-d043a0249cc485b08d93767d2075bd83R124) for parsing a single chunk of json data into a RecordBatch with fully converted columns - Chunker to process a stream of unchunked data for use by BlockParser (not currently used) Author: Benjamin Kietzman <bengilgit@gmail.com> Author: Wes McKinney <wesm+git@apache.org> Closes #3592 from bkietz/ARROW-694-json-reader-WIP and squashes the following commits: e42e5d730 <Wes McKinney> Add arrow_dependencies to arrow_flight.so dependencies to fix race condition with Flatbuffers d67aff63e <Benjamin Kietzman> adding more comments to parser.cc 554b595d9 <Benjamin Kietzman> adding explanatory comments to json/parser.cc 0d0caa991 <Benjamin Kietzman> Add ARROW_PREDICT_* to conditions in parser.cc d7d0a2eb6 <Benjamin Kietzman> fix doc error in chunker.h 29c23128a <Wes McKinney> Disable arrow-json-chunker-test 76d0431e2 <Wes McKinney> cmake-format 83150ec13 <Wes McKinney> Restore BinaryBuilder::UnsafeAppend for const std::string& 05fd9d189 <Benjamin Kietzman> add json project back (merge error) 69541ea87 <Benjamin Kietzman> correct test util includes be720c812 <Benjamin Kietzman> Use compound statements in if() baac46735 <Benjamin Kietzman> use const shared_ptr<T>& instead of move 6e86078f1 <Benjamin Kietzman> Move ParseOne to reader.cc 1bebda861 <Benjamin Kietzman> clean up Chunker's stream usage 0d6d92026 <Benjamin Kietzman> disabling chunker test c1a7f4bd3 <Benjamin Kietzman> check status for generating list elements 292672aee <Benjamin Kietzman> add inline tag f92f8508c <Benjamin Kietzman> add Status return to Generate e3346485e <Benjamin Kietzman> remove misplaced const 9679b6f76 <Benjamin Kietzman> fix format issue, use SFINAE to detect StringConverter default constructibility aaaf9e7e8 <Benjamin Kietzman> remove bitfields 10e4f0dc4 <Benjamin Kietzman> add missing virtual destructor 92ffc640d <Benjamin Kietzman> adding ParseOne interface for dead simple parsing 330615b95 <Benjamin Kietzman> adding first draft of parsing with type inference 69d7c5c00 <Benjamin Kietzman> Rewrite parser to defer conversion of strings and numbers b9d5c3d2d <Benjamin Kietzman> adding Chunker implementation and tests 2677a575d <Benjamin Kietzman> use recommended loop style e39a4a9e0 <Benjamin Kietzman> add (failing) test for '-0' consistency 0f3a3bc0f <Benjamin Kietzman> Added trivial parser benchmark and data generator 6472738b3 <Benjamin Kietzman> Refactored type inferrence cb6a313d7 <Benjamin Kietzman> adding first draft of type inferrence to BlockParser cc3698a44 <Benjamin Kietzman> refactoring Schema::GetFieldIndex to return int 17176f975 <Benjamin Kietzman> first sketch of JSON parser