Commits


liyafan82 authored and Micah Kornfield committed 149efd9441b
ARROW-5917: [Java] Redesign the dictionary encoder The current dictionary encoder implementation (org.apache.arrow.vector.dictionary.DictionaryEncoder) has heavy performance overhead, which prevents it from being useful in practice: * There are repeated conversions between Java objects and bytes (e.g. vector.getObject). * Unnecessary memory copy (the vector data must be copied to the hash table). * The hash table cannot be reused for encoding multiple vectors (other data structure & results cannot be reused either). * The output vector should not be created/managed by the encoder (just like in the out-of-place sorter) * The hash table requires that the hashCode & equals methods be implemented appropriately, but this is not guaranteed. We plan to implement a new one in the algorithm module, and gradually deprecate the current one. Closes #4994 from liyafan82/fly_0712_encode and squashes the following commits: 8b699a8f7 <liyafan82> Redesign the dictionary encoder Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>