Commits

Weston Pace authored 8837f64d4e5
ARROW-13611: [C++] Scanning datasets does not enforce back pressure This PR adds backpressure back into the asynchronous scanner. It creates an AsyncToggle which can be shared between the push-based sink and the pull-based scanner. The sink will close the toggle when it's buffer fills up and the scanner will pause delivering items when the toggle is closed. This PR adds the feature in a way that bypasses the exec plan's backpressure mechanisms as those have not been fully fleshed out and I still am not sure what direction we are planning to go with that. Instead the back pressure is almost completely handled outside of the compute space. I've got the same mechanism working for dataset writes but I don't want to hold up this PR while I wait for the write node to merge so I have created ARROW-14191 to follow that work. Currently backpressure is broken for ordered scans. It turns out this has always been the case for the asynchronous scanner, even before it moved to the exec plan. The root cause is that the merge generator will keep reading from files 2-N if the read on file 1 is slow. I have created a test case which demonstrates this but will defer fixing this for ARROW-14192 Closes #11285 from westonpace/feature/ARROW-13611--scanning-datasets-backpressure Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>