Commits


ptaylor authored and Brian Hulette committed 1442fb61d3e
ARROW-4552: [JS] Add high-level Table and Column convenience methods This PR closes the following JIRAs: * [ARROW-4552](https://issues.apache.org/jira/browse/ARROW-4552) - Add Table and Schema `assign(other)` implementations * [ARROW-2764](https://issues.apache.org/jira/browse/ARROW-2764) - Easy way to create a new Table with an additional column * [ARROW-4553](https://issues.apache.org/jira/browse/ARROW-4553) - Implement Schema/Field/DataType comparators * [ARROW-4554](https://issues.apache.org/jira/browse/ARROW-4554) - Implement logic for combining Vectors with different lengths/chunksizes * [ARROW-4555](https://issues.apache.org/jira/browse/ARROW-4555) - Add high-level Table and Column creation methods * [ARROW-4557](https://issues.apache.org/jira/browse/ARROW-4557) - Add Table/Schema/RecordBatch `selectAt(...indices)` method I extracted a few more high-level helper methods I've had laying around for creating, selecting, or manipulating Tables/Columns/Schemas/RecordBatches. 1. We currently have a `table.select(...colNames)` implementation, so I also added a `table.selectAt(...colIndices)` method to complement. Super handy when you have duplicates. 2. I added a basic `table.assign(otherTable)` impl. I added logic to compare Schemas/Fields/DataTypes in order to de-dupe reliably, which lives in the [`TypeComparator` Visitor](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/src/visitor/typecomparator.ts#L83). I expose this via `compareTo()` methods on the Schema, Field, and DataType for ease of use. Bonus: the Writer [can now discern](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/src/ipc/writer.ts#L129) between RecordBatches of the same stream whose Schemas aren't reference-equal. 3. I've also added logic to distribute Vectors of different lengths (or different internal chunk sizes) evenly across RecordBatches, to support a nearly zero-copy `Table#assign()` impl. I say nearly zero-copy, because there's a bit of allocation/copying to backfill null bitmaps if chunks don't exactly line up. But this also means [it's a bit easier](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/test/unit/table-tests.ts#L178) now to create Tables or RecordBatches from values in-memory whose lengths may not exactly line up: ```ts const table = Table.new( Column.new('foo', IntVector.from(arange(new Int32Array(10))), Column.new('bar', FloatVector.from(arange(new Float32Array(100)))) ); ``` 4. And lastly, I added [some more more tests](https://github.com/trxcllnt/arrow/blob/js/high-level-table-column-fns/js/test/unit/table/serialize-tests.ts#L38) to ensure various combinations of select/slice/concat/assign can round-trip through IPC and back again. ```ts const table1 = Table.new( Column.new('a', Int32Vector.from(i32s)), Column.new('b', Float32Vector.from(f32s)), Column.new('c', Float64Vector.from(f64s)) ); const table2 = Table.new( Column.new('d', Utf8Vector.from(strs)), Column.new('d', BoolVector.from(bools)), Column.new('d', Int32Vector.from(i32s)), ); const table3 = table1.select('b', 'c').assign(table2.selectAt(0, 1)); console.log(table3.schema.fields) // > [ // > ('b', Float32), // > ('c', Float64), // > ('d', Utf8), // > ('d', Bool) // > ] ``` (cc: @domoritz) Author: ptaylor <paul.e.taylor@me.com> Closes #3634 from trxcllnt/js/high-level-table-column-fns and squashes the following commits: 9943d9c2 <ptaylor> fix lint 4b8fb547 <ptaylor> add a test for table and recordbatch with a single column 17580639 <ptaylor> add Table.new docstring bfbcc8b6 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns 5b6d938d <ptaylor> cleanup 98c8e525 <ptaylor> add initial RecordBatch.new and select tests dc801434 <ptaylor> remove Table.fromVectors in favor of Table.new 73b8af7d <ptaylor> fix Int64Vector typings 83de5ed0 <ptaylor> guard against out-of-bounds selections a67bd562 <ptaylor> clean up: eliminate more getters in favor of read-only properties 7a8daada <ptaylor> clean up/speed up: move common argument flattening methods into a utility file 41aa9024 <ptaylor> Add more tests to ensure Tables can serialize through various slice, concat, assign steps 07a2c964 <ptaylor> add basic Table#assign tests e4a5d870 <ptaylor> split out the generated data validators for reuse 99e88883 <ptaylor> add Table and Schema assign() impls 0ac786c7 <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index cf6f97ac <ptaylor> add TypeComparator visitor so we can compare Schemas, Fields, and DataTypes b2153aa1 <ptaylor> ensure the Vector map types always fall back to BaseVector 9d8f4938 <ptaylor> cleanup: use the specialized typed array casting functions 85d0e001 <ptaylor> fix uniform chunk distribution when the new chunks are longer than the current chunks 8218f404 <ptaylor> Ensure Chunked#slice() range end is correct when there's only a single chunk 3f16c813 <ptaylor> fix typo c9eeb056 <ptaylor> fix lint 933b5312 <ptaylor> Narrow the signature of Schema#fields to Field<T>, cleanup bdb23b83 <ptaylor> ensure uniform chunk lengths in RecordBatch.from() 9d1f2ade <ptaylor> add Table.new() convenience method for creating Tables from Columns or , name | Field] arguments 0407cd74 <ptaylor> add Column.new() convenience method for creating Columns with string names and polymorphic chunk types db390311 <ptaylor> add public Field#clone impl for convenience 97c349e8 <ptaylor> add nullable and metadata getters to the Column class 5dfc1007 <ptaylor> make the abstract Vector a type alias to trick TS into letting us override static methods 6ddfaf83 <ptaylor> narrow the FloatVector.from() return signatures