Commits


Yuqi Gu authored and Wes McKinney committed d7dc0212a2f
ARROW-8633: [C++] Add ValidateAscii function The patch is to implement ValidateAscii function. The benchmark and test facilities are also added. ### The benchmark on x86: **Original:** ``` Run on (20 X 3000 MHz CPU s) CPU Caches: L1 Data 32K (x10) L1 Instruction 32K (x10) L2 Unified 256K (x10) L3 Unified 25600K (x1) Load Average: 4.79, 5.63, 2.68 ----------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------- ValidateTinyAscii 2.39 ns 2.39 ns 276349157 bytes_per_second=3.8964G/s ValidateTinyNonAscii 8.21 ns 8.21 ns 85421905 bytes_per_second=1.24781G/s ValidateSmallAscii 10.9 ns 10.9 ns 65497418 bytes_per_second=11.6721G/s ValidateSmallAlmostAscii 46.1 ns 46.1 ns 15204522 bytes_per_second=2.99142G/s ValidateSmallNonAscii 84.6 ns 84.6 ns 8303767 bytes_per_second=1.47429G/s ValidateLargeAscii 4997 ns 4997 ns 136960 bytes_per_second=18.6385G/s ValidateLargeAlmostAscii 30575 ns 30575 ns 22651 bytes_per_second=3.04752G/s ValidateLargeNonAscii 73714 ns 73713 ns 9385 bytes_per_second=1.26467G/s ``` **Enable simd** ``` Run on (20 X 3000 MHz CPU s) CPU Caches: L1 Data 32K (x10) L1 Instruction 32K (x10) L2 Unified 256K (x10) L3 Unified 25600K (x1) Load Average: 4.79, 5.63, 2.68 ----------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------- ValidateTinyAscii 6.36 ns 6.36 ns 100259371 bytes_per_second=1.46438G/s ValidateTinyNonAscii 11.3 ns 11.3 ns 61604638 bytes_per_second=926.575M/s ValidateSmallAscii 9.74 ns 9.74 ns 71987411 bytes_per_second=13.0987G/s ValidateSmallAlmostAscii 51.1 ns 51.1 ns 13677942 bytes_per_second=2.69774G/s ValidateSmallNonAscii 84.5 ns 84.5 ns 8135065 bytes_per_second=1.47735G/s ValidateLargeAscii 2363 ns 2363 ns 298863 bytes_per_second=39.4107G/s ValidateLargeAlmostAscii 31006 ns 31006 ns 22642 bytes_per_second=3.00508G/s ValidateLargeNonAscii 76222 ns 76222 ns 8703 bytes_per_second=1.22305G/s ``` ### The benchmark on Arm64 **Original:** ``` Run on (46 X 2600 MHz CPU s) Load Average: 0.19, 0.66, 1.73 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ----------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------- ValidateTinyAscii 14.7 ns 14.7 ns 47461698 bytes_per_second=646.682M/s ValidateTinyNonAscii 48.0 ns 48.0 ns 14576733 bytes_per_second=218.46M/s ValidateSmallAscii 109 ns 109 ns 6404395 bytes_per_second=1.1667G/s ValidateSmallAlmostAscii 275 ns 275 ns 2544737 bytes_per_second=513.185M/s ValidateSmallNonAscii 555 ns 555 ns 1261830 bytes_per_second=230.408M/s ValidateLargeAscii 78511 ns 78502 ns 8915 bytes_per_second=1.18649G/s ValidateLargeAlmostAscii 179928 ns 179907 ns 3891 bytes_per_second=530.347M/s ValidateLargeNonAscii 415107 ns 415058 ns 1686 bytes_per_second=229.994M/s ``` **Enable simd** ``` Run on (46 X 2600 MHz CPU s) Load Average: 0.19, 0.66, 1.73 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ----------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------- ValidateTinyAscii 65.9 ns 65.9 ns 10597823 bytes_per_second=144.749M/s ValidateTinyNonAscii 48.0 ns 48.0 ns 14584553 bytes_per_second=218.732M/s ValidateSmallAscii 83.7 ns 83.7 ns 8367346 bytes_per_second=1.52478G/s ValidateSmallAlmostAscii 275 ns 275 ns 2542186 bytes_per_second=512.498M/s ValidateSmallNonAscii 555 ns 555 ns 1261774 bytes_per_second=230.301M/s ValidateLargeAscii 3109 ns 3108 ns 225186 bytes_per_second=29.9637G/s ValidateLargeAlmostAscii 179998 ns 179974 ns 3889 bytes_per_second=530.149M/s ValidateLargeNonAscii 414228 ns 414181 ns 1691 bytes_per_second=230.481M/s ``` `ValidateLargeAscii` case will get performance boost when leveraging simd: - x86: `18.6385G/s -> 39.4107G/s` - Arm64: `1.18649G/s -> 29.9637G/s` Closes #7121 from guyuqi/ARROW-8633 Lead-authored-by: Yuqi Gu <yuqi.gu@arm.com> Co-authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>