Commits

Wes McKinney authored c06b7654bcc
ARROW-62: Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction As the initial scribe for the Arrow format, I made a mistake in what the null bits mean (1 for not-null, 0 for null). I also addressed ARROW-56 (bit-numbering) here. Database systems are split on this subject. PostgreSQL for example does it this way: http://www.postgresql.org/docs/9.5/static/storage-page-layout.html > In this list of bits, a 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not present, all columns are assumed not-null. Since the Drill implementation predates the Arrow project, I think it's safe to go with this. This patch also includes ARROW-76 which adds a "null count" to the memory layout indicating the actual number of nulls in an array. This also strikes the "non-nullable" distinction from the memory layout as there is no semantic difference between arrays with null count 0 and a non-nullable array. Instead, users may choose to set `nullable=false` in the schema metadata and verify that Arrow memory conforms to the schema. Author: Wes McKinney <wesm@apache.org> Closes #34 from wesm/ARROW-62 and squashes the following commits: 8c92926 [Wes McKinney] Add to README about what the format documents are 1f6fe03 [Wes McKinney] Account for null count and non-nullable removal from ARROW-76 648fd47 [Wes McKinney] Indicate that bitmaps should be a multiple of 8 bytes 4333d82 [Wes McKinney] Use 'null bitmap' similar to PostgreSQL documentation dac77d4 [Wes McKinney] Revise format document language re: null bitmaps per feedback f7a3898 [Wes McKinney] Revise format to indicate LSB bit numbering and 0/1 null/not-null distinction