Commits


Adam Reeve authored and GitHub committed 8b2ab4d8200
GH-18547: [Java] Support re-emitting dictionaries in ArrowStreamWriter (#35920) ### Rationale for this change This allows writing IPC streams where dictionary values change between record batches. ### What changes are included in this PR? * Add new abstract `void ensureDictionariesWritten(DictionaryProvider provider, Set<Long> dictionaryIdsUsed)` to the base `ArrowWriter` class * Move existing logic that only writes dictionaries once into the `ArrowFileWriter` class * Implement replacement dictionary writing in `ArrowStreamWriter` by keeping copies of previously written dictionaries ### Are these changes tested? Yes, I've added a new unit test for this ### Are there any user-facing changes? Yes, `ArrowStreamWriter` will now write replacement dictionaries when dictionary values change between batches. **This PR includes breaking changes to public APIs.** `ArrowWriter` has a new abstract `ensureDictionariesWritten` method. This will only affect users directly inheriting from `ArrowWriter` rather than `ArrowFileWriter` or `ArrowStreamWriter`. There's also a behaviour change to `ArrowWriter`, where previously dictionaries were read from a `DictionaryProvider` on construction, but this is now delayed until the first batch is written. * Closes: #18547 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>