Commits


Sebastien Binet authored and Uwe L. Korn committed 0d0ff7521b2
ARROW-3929: [Go] improve CSV reader memory usage This CL enables the encoding/csv reader to reuse the memory used by records, from row to row, and thus reduce memory pressure on Go's GC. ``` $> benchstat old.txt new.txt name old time/op new time/op delta Read/rows=10_cols=1_chunks=10-8 39.4µs ±18% 42.6µs ±24% ~ (p=0.218 n=10+10) Read/rows=10_cols=10_chunks=10-8 293µs ±23% 280µs ±24% ~ (p=0.400 n=10+9) Read/rows=10_cols=100_chunks=10-8 2.72ms ±24% 2.56ms ±20% ~ (p=0.353 n=10+10) Read/rows=10_cols=1000_chunks=10-8 24.3ms ± 2% 24.0ms ± 3% ~ (p=0.059 n=8+9) Read/rows=100_cols=1_chunks=10-8 74.9µs ±11% 62.1µs ±19% -17.21% (p=0.004 n=10+10) Read/rows=100_cols=10_chunks=10-8 559µs ±21% 474µs ±21% -15.12% (p=0.009 n=10+10) Read/rows=100_cols=100_chunks=10-8 5.53ms ±21% 4.36ms ±16% -21.27% (p=0.000 n=10+9) Read/rows=100_cols=1000_chunks=10-8 41.9ms ± 3% 42.2ms ±13% ~ (p=0.684 n=10+10) Read/rows=1000_cols=1_chunks=10-8 421µs ±13% 320µs ±10% -23.98% (p=0.000 n=10+10) Read/rows=1000_cols=10_chunks=10-8 3.24ms ±24% 2.63ms ±15% -18.77% (p=0.007 n=10+10) Read/rows=1000_cols=100_chunks=10-8 33.0ms ±17% 27.0ms ±19% -18.09% (p=0.001 n=10+10) Read/rows=1000_cols=1000_chunks=10-8 219ms ± 1% 211ms ± 2% -3.81% (p=0.000 n=9+10) Read/rows=10000_cols=1_chunks=10-8 3.66ms ±11% 2.91ms ±10% -20.27% (p=0.000 n=10+10) Read/rows=10000_cols=10_chunks=10-8 31.8ms ±16% 25.6ms ±15% -19.66% (p=0.000 n=10+10) Read/rows=10000_cols=100_chunks=10-8 192ms ± 1% 182ms ± 1% -5.19% (p=0.000 n=10+10) Read/rows=10000_cols=1000_chunks=10-8 1.99s ± 1% 1.93s ± 2% -3.26% (p=0.000 n=9+9) Read/rows=100000_cols=1_chunks=10-8 32.9ms ± 4% 26.1ms ± 4% -20.75% (p=0.000 n=10+10) Read/rows=100000_cols=10_chunks=10-8 203ms ± 1% 198ms ± 7% ~ (p=0.123 n=10+10) Read/rows=100000_cols=100_chunks=10-8 2.00s ± 1% 1.92s ± 1% -4.24% (p=0.000 n=10+8) Read/rows=100000_cols=1000_chunks=10-8 22.7s ± 2% 22.0s ± 2% -3.31% (p=0.000 n=9+10) name old alloc/op new alloc/op delta Read/rows=10_cols=1_chunks=10-8 32.7kB ± 0% 32.2kB ± 0% -1.32% (p=0.000 n=10+10) Read/rows=10_cols=10_chunks=10-8 281kB ± 0% 277kB ± 0% -1.54% (p=0.000 n=10+10) Read/rows=10_cols=100_chunks=10-8 2.77MB ± 0% 2.73MB ± 0% -1.58% (p=0.000 n=10+10) Read/rows=10_cols=1000_chunks=10-8 27.8MB ± 0% 27.3MB ± 0% -1.59% (p=0.000 n=9+9) Read/rows=100_cols=1_chunks=10-8 44.0kB ± 0% 39.3kB ± 0% -10.80% (p=0.000 n=10+10) Read/rows=100_cols=10_chunks=10-8 381kB ± 0% 333kB ± 0% -12.48% (p=0.000 n=10+10) Read/rows=100_cols=100_chunks=10-8 3.78MB ± 0% 3.29MB ± 0% -12.75% (p=0.000 n=10+10) Read/rows=100_cols=1000_chunks=10-8 37.9MB ± 0% 33.1MB ± 0% -12.83% (p=0.000 n=10+9) Read/rows=1000_cols=1_chunks=10-8 200kB ± 0% 152kB ± 0% -23.99% (p=0.000 n=10+10) Read/rows=1000_cols=10_chunks=10-8 1.84MB ± 0% 1.36MB ± 0% -26.08% (p=0.000 n=10+9) Read/rows=1000_cols=100_chunks=10-8 18.4MB ± 0% 13.5MB ± 0% -26.44% (p=0.000 n=9+10) Read/rows=1000_cols=1000_chunks=10-8 184MB ± 0% 135MB ± 0% -26.62% (p=0.000 n=10+10) Read/rows=10000_cols=1_chunks=10-8 1.65MB ± 0% 1.17MB ± 0% -29.02% (p=0.000 n=10+10) Read/rows=10000_cols=10_chunks=10-8 15.7MB ± 0% 10.9MB ± 0% -30.65% (p=0.000 n=10+10) Read/rows=10000_cols=100_chunks=10-8 156MB ± 0% 108MB ± 0% -31.12% (p=0.000 n=10+8) Read/rows=10000_cols=1000_chunks=10-8 1.58GB ± 0% 1.09GB ± 0% -31.06% (p=0.000 n=10+10) Read/rows=100000_cols=1_chunks=10-8 20.1MB ± 0% 15.3MB ± 0% -23.93% (p=0.000 n=10+9) Read/rows=100000_cols=10_chunks=10-8 197MB ± 0% 149MB ± 0% -24.39% (p=0.000 n=10+8) Read/rows=100000_cols=100_chunks=10-8 1.96GB ± 0% 1.47GB ± 0% -24.86% (p=0.000 n=10+10) Read/rows=100000_cols=1000_chunks=10-8 19.7GB ± 0% 14.7GB ± 0% -25.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Read/rows=10_cols=1_chunks=10-8 319 ± 0% 310 ± 0% -2.82% (p=0.000 n=10+10) Read/rows=10_cols=10_chunks=10-8 2.63k ± 0% 2.62k ± 0% -0.34% (p=0.000 n=10+10) Read/rows=10_cols=100_chunks=10-8 25.7k ± 0% 25.7k ± 0% -0.04% (p=0.000 n=10+10) Read/rows=10_cols=1000_chunks=10-8 256k ± 0% 256k ± 0% -0.00% (p=0.000 n=10+10) Read/rows=100_cols=1_chunks=10-8 524 ± 0% 425 ± 0% -18.89% (p=0.000 n=10+10) Read/rows=100_cols=10_chunks=10-8 3.02k ± 0% 2.92k ± 0% -3.27% (p=0.000 n=10+10) Read/rows=100_cols=100_chunks=10-8 28.0k ± 0% 27.9k ± 0% -0.35% (p=0.000 n=10+10) Read/rows=100_cols=1000_chunks=10-8 277k ± 0% 277k ± 0% -0.04% (p=0.000 n=10+10) Read/rows=1000_cols=1_chunks=10-8 2.43k ± 0% 1.44k ± 0% -41.04% (p=0.000 n=10+10) Read/rows=1000_cols=10_chunks=10-8 5.92k ± 0% 4.92k ± 0% -16.87% (p=0.000 n=10+10) Read/rows=1000_cols=100_chunks=10-8 40.8k ± 0% 39.8k ± 0% -2.45% (p=0.000 n=10+10) Read/rows=1000_cols=1000_chunks=10-8 389k ± 0% 388k ± 0% -0.26% (p=0.000 n=10+10) Read/rows=10000_cols=1_chunks=10-8 20.6k ± 0% 10.6k ± 0% -48.58% (p=0.000 n=10+10) Read/rows=10000_cols=10_chunks=10-8 25.4k ± 0% 15.4k ± 0% -39.33% (p=0.000 n=10+10) Read/rows=10000_cols=100_chunks=10-8 73.8k ± 0% 63.8k ± 0% -13.56% (p=0.000 n=10+10) Read/rows=10000_cols=1000_chunks=10-8 557k ± 0% 547k ± 0% -1.79% (p=0.000 n=10+10) Read/rows=100000_cols=1_chunks=10-8 201k ± 0% 101k ± 0% -49.78% (p=0.000 n=10+10) Read/rows=100000_cols=10_chunks=10-8 208k ± 0% 108k ± 0% -48.02% (p=0.000 n=10+10) Read/rows=100000_cols=100_chunks=10-8 282k ± 0% 182k ± 0% -35.49% (p=0.000 n=10+10) Read/rows=100000_cols=1000_chunks=10-8 1.02M ± 0% 0.92M ± 0% -9.83% (p=0.000 n=10+10) ``` Author: Sebastien Binet <binet@cern.ch> Closes #3073 from sbinet/issue-3929 and squashes the following commits: 67a3272c <Sebastien Binet> ARROW-3929: improve CSV reader memory usage 8eb60c52 <Sebastien Binet> ARROW-3681: Add benchmarks for CSV reader