Commits


Joost Hoozemans authored and Antoine Pitrou committed bf18e6e4b5b
ARROW-9648: [C++] Added compression level parameter to LZ4_FRAME compression codec This will only work for the Arrow IPC format, not for Parquet, because Parquet uses the raw LZ4 format instead of the framed format. Still to figure out: - [x] Do we need to add tests to test if setting the compression level works as intended? (Or is this already covered by tests for the other formats that support levels) - [x] Figure out the raw format for Parquet files The raw format does not support the compression level, but supports an acceleration value. This value works in opposite direction; increasing it speeds up the compression in expense of the ratio. The value can be set from 1 to 65537, so that's also completely different from the compression level in the framing format (1 to 12). Update: as it turns out, the compression level is linked to the compressor variant, not to whether or not framing is used. Now, when using `lz4_raw` (as used by Parquet), the fast compressor is chosen when `compression_level < 3` and the high compression version is chosen when `compression_level >= 3`. The latter supports compression levels so the value is passed on to it. This behavior is similar to what happens in the lz4 CLI program (see for example https://github.com/lz4/lz4/blob/4c9431e9af596af0556e5da0ae99305bafb2b10b/lib/lz4frame.c#L815). Closes #11810 from joosthooz/arrow-9648 Lead-authored-by: Joost Hoozemans <joosthooz@msn.com> Co-authored-by: Joost Hoozemans <joost@pop-os.localdomain> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>