Commits


Maarten A. Breddels authored and Antoine Pitrou committed d98b9c57806
ARROW-9131: [C++] Faster ascii_lower and ascii_upper. Following up on #7418 I tried and benchmarked a different way for * ascii_lower * ascii_upper Before (lower is similar): ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 4922843 ns 4918961 ns 10 bytes_per_second=3.1457G/s items_per_second=213.17M/s ``` After: ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 1391272 ns 1390014 ns 10 bytes_per_second=11.132G/s items_per_second=754.363M/s ``` This is a 3.7x speedup (on a AMD machine). Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2. Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode Standard Core Specification v13.0). Closes #7434 from maartenbreddels/ARROW-9131 Authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>