Commits


Paul Taylor authored and Wes McKinney committed b3a3a743866
ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors. I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary. I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change! This PR is marked WIP because I added an [integration test](https://github.com/apache/arrow/commit/6e98874d9f4bfae7758f8f731212ae7ceb3f1321#diff-18c6be12406c482092d4b1f7bd70a8e1R22) that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n [comment in ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812) and remove/ignore the metadata, this test should pass too. cc @TheNeuralBit Author: Paul Taylor <paul.e.taylor@me.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1294 from trxcllnt/generate-js-test-files and squashes the following commits: f907d5a7 [Paul Taylor] fix aggressive closure-compiler mangling in the ES5 UMD bundle 57c7df45 [Paul Taylor] remove arrow files from perf tests 5972349c [Paul Taylor] update performance tests to use generated test data 14be77f4 [Paul Taylor] fix Date64Vector TypedArray, enable datetime integration tests 5660eb34 [Wes McKinney] Use openjdk8 for integration tests, jdk7 for main Java CI job 019e8e24 [Paul Taylor] update closure compiler with full support for ESModules, and remove closure-compiler-scripts 48111290 [Paul Taylor] Add support for reading Arrow buffers < MetadataVersion 4 c72134a5 [Paul Taylor] compile JS source in integration tests c83a700d [Wes McKinney] Hack until ARROW-1837 resolved. Constrain unsigned integers max to signed max for bit width fd3ed475 [Wes McKinney] Uppercase hex values 224e041c [Wes McKinney] Remove hard-coded file name to prevent primitive JSON file from being clobbered 0882d8e9 [Paul Taylor] separate JS unit tests from integration tests in CI 1f6a81b4 [Paul Taylor] add missing mkdirp for test json data 19136fbf [Paul Taylor] remove test data files in favor of auto-generating them in CI 9f195682 [Paul Taylor] Generate test files when the test run if they don't exist 0cdb74e0 [Paul Taylor] Add a cli arg to integration_test.py generate test JSON files for JS cc744564 [Paul Taylor] resolve LICENSE.txt conflict 33916230 [Paul Taylor] move js license to top-level license.txt d0b61f49 [Paul Taylor] add validate package script back in, make npm-release.sh suitable for ASF release process 7e3be574 [Paul Taylor] Copy license.txt and notice.txt into target dirs from arrow root. c8125d2d [Paul Taylor] Update readme to reflect new Table.from signature 49ac3398 [Paul Taylor] allow unrecognized cli args in gulpfile 3c52587e [Paul Taylor] re-enable node_js job in travis cb142f11 [Paul Taylor] add npm release script, remove unused package scripts d51793dd [Paul Taylor] run tests on src folder for accurate jest coverage statistics c087f482 [Paul Taylor] generate test data in build scripts 1d814d00 [Paul Taylor] excise test data csvs 14d48964 [Paul Taylor] stringify Struct Array cells 1f004968 [Paul Taylor] rename FixedWidthListVector to FixedWidthNumericVector be73c918 [Paul Taylor] add BinaryVector, change ListVector to always return an Array 02fb3006 [Paul Taylor] compare iterator results in integration tests e67a66a1 [Paul Taylor] remove/ignore test snapshots (getting too big) de7d96a3 [Paul Taylor] regenerate test arrows from master a6d3c83e [Paul Taylor] enable integration tests 44889fbe [Paul Taylor] report errors generating test arrows fd68d510 [Paul Taylor] always increment validity buffer index while reading 562eba7d [Paul Taylor] update test snapshots d4399a8a [Paul Taylor] update integration tests, add custom jest vector matcher 8d44dcd7 [Paul Taylor] update tests 6d2c03d4 [Paul Taylor] clean arrows folders before regenerating test data 4166a9ff [Paul Taylor] hard-code reader to Arrow spec and ignore field layout metadata c60305d6 [Paul Taylor] refactor: flatten vector folder, add more types ba984c61 [Paul Taylor] update dependencies 5eee3eaa [Paul Taylor] add integration tests to compare how JS reads cpp vs. java arrows d4ff57aa [Paul Taylor] update test snapshots 407b9f5b [Paul Taylor] update reader/table tests for new generated arrows 85497069 [Paul Taylor] update cli args to execute partial test runs for debugging eefc256d [Paul Taylor] remove old test arrows, add new generated test arrows 0cd31ab9 [Paul Taylor] add generate-arrows script to tests 3ff71384 [Paul Taylor] Add bool, date, time, timestamp, and ARROW-1693 workaround in reader 4a34247c [Paul Taylor] export Row type 141194e7 [Paul Taylor] use fieldNode.length as vector length c45718e7 [Paul Taylor] support new DictionaryBatch isDelta flag 9d8fef97 [Paul Taylor] split DateVector into Date32 and Date64 types 8592ff3c [Paul Taylor] update generated format flatbuffers