Commits


Ben Harkins authored and GitHub committed 143c6917062
GH-36311: [C++] Fix integer overflows in `utf8_slice_codeunits` (#36575) ### Rationale for this change The default value for the `SliceOptions::stop` is `INT64_MAX`, which isn't considered in several internal calculations - resulting in integer overflows and unexpected behavior when `stop` isn't provided. Also note that running the included tests without the fixes should result in ubsan errors (it did for me, at least). ### What changes are included in this PR? - Adds some logic to `SliceCodunitsTransform` that handles potential overflows - Adds tests for cases where the `start` param is positive/negative and `stop` is the maximum value **Update** Discovered that `utf8_slice_codeunits` deviates from Python array behavior when `stop=None` and `step < 0`, so further changes were made: - Handles `INT64_MIN` for `SliceOptions::stop` on C++ side, adds more tests. - Updates Python bindings for `SliceOptions` so that the default value when `stop=None` (`sys.maxsize`) is negated when `step < 0` - Adds `None` as a possible `stop` value in Python tests ### Are these changes tested? Yes (tests are included) ### Are there any user-facing changes? In theory, altering the behavior of `utf8_slice_codepoints` when `stop=None` and `step < 0` could be considered a breaking change. That being said, the current implementation produces incorrect results whenever `None` is even used, so it probably isn't one in practice... * Closes: #36311 Authored-by: benibus <bpharks@gmx.com> Signed-off-by: Antoine Pitrou <antoine@python.org>