Commits

Benjamin Kietzman authored 5b9deef40a2
ARROW-6965: [C++][Dataset] Optionally expose partition keys as columns - PartitionScheme now acts on individual segments of a path rather than the whole path. This allows easier extraction of partition information relative to a given parent - FileSystemDataSource applies modified scan options to each file fragment it emits, including simplified filters and projectors containing default column values derived from partition keys - PathTree is removed in favor of PathForest, which provides a lighter weight implementation and supports accessing nodes' parents - PathPartitions are removed in favor of SegmentDictionaryPartitionScheme (should resolve ARROW-7069) - RecordBatchProjector now provides a mutator for setting default column values, rather than requiring these be specified on construction - RecordBatchProjector and ExpressionEvaluator use the MemoryPool provided by a scan context - specialize IterationTraits<util::optional<T>> to support Iterator<T> when a reserved "end" value for T is difficult - add path utilities IsAncestorOf and RemoveAncestor Closes #5950 from bkietz/6965-Dataset-Optionally-expose and squashes the following commits: 849883ca8 <Benjamin Kietzman> remove ; e53092a9e <Benjamin Kietzman> explicit namespace for MSVC b19b62461 <Benjamin Kietzman> rename stl.h to vector.h a99f85143 <Benjamin Kietzman> avoid unnecessary lifetime wrapper in scanner_internal.h 49260af91 <Benjamin Kietzman> extract PartitionScheme application back to discovery e52609777 <Benjamin Kietzman> add explicit AssociatedObjects support to PathForest 215a5aa5d <Benjamin Kietzman> avoid walking a file's entire ancestry in GetFragmentsImpl 8bc0895db <Benjamin Kietzman> review comments 3a8f0f61a <Benjamin Kietzman> cast Scalars unlazily in InsertImplicitCasts 8a915f79f <Benjamin Kietzman> move MakeScalar and Scalar::Parse to Result return ce22609c8 <Benjamin Kietzman> fix string repr of CastExpression ee6b9823d <Benjamin Kietzman> add a test for schema Inspection from partition schemes fa4ffbdc7 <Benjamin Kietzman> ensure discovered partition fields are ordered ab65ced02 <Benjamin Kietzman> remove base_dir from PartitionSchemeDiscovery 4ef57f074 <François Saint-Jacques> Add comprehensive mixed physical/virtual column Scanner tests d94854917 <Benjamin Kietzman> preserve metadata from schemas inspected during discovery 786256ac4 <Benjamin Kietzman> add ScannerTest which materializes a missing column 8f66a0d62 <Benjamin Kietzman> try exporting PathForest::Ref explicitly for MSVC fcbdc2ea1 <Neal Richardson> Fix dataset partition tests in R, including TODOs for when autocasting works 12dcc61ee <Benjamin Kietzman> refactor RecordBatchProjector usage 4b8832c29 <Benjamin Kietzman> lint fixes c878b2389 <Benjamin Kietzman> remove part column from test-dataset.R to test materialization of the partition 0a154e6a5 <Benjamin Kietzman> delete partition columns from e2e test's datafiles 79058b9fd <Benjamin Kietzman> ScanTaskIteratorFromRecordBatch accepts options, context 12ca8bdcb <Benjamin Kietzman> use ScanTask::options when filtering and projecting 3674d55bc <Benjamin Kietzman> expose ScanTask::context,options as properties d1aa3ea48 <Benjamin Kietzman> refactor ExpressionEvaluator not to assume a MemoryPool in closure 6202c5c59 <Benjamin Kietzman> ensure trailing slash for directories in PathForest::ToString() 7bec3a805 <Benjamin Kietzman> use partition_base_dir in FileSystemDataSource::Make 3419c3cce <Benjamin Kietzman> add #includes 36e865032 <Benjamin Kietzman> add a projected partition column to E2E test 63cfb7ec4 <Benjamin Kietzman> specialize IterationTraits for util::optional 70fec9187 <Benjamin Kietzman> refactor projector to admit piecemeal addition of default values f3819d4fa <Benjamin Kietzman> add PartitionSchemeDiscovery for inferring partition schemas ce8ee8485 <Benjamin Kietzman> add parent buffer to PathForest faa0193f6 <Benjamin Kietzman> extract IsAncestrOf to path_util.h 90b1b23df <Benjamin Kietzman> remove PathTree in favor of PathForest 3bd91be7c <Benjamin Kietzman> refactor FileSystemDataSource to use a partition scheme on construction 7a587a0d9 <Benjamin Kietzman> add Result<T>::ValueOr() for access to a value or an alternative 99f7f090e <Benjamin Kietzman> refactor FileSource: variant<type dependent members> eeb4f8ee3 <Benjamin Kietzman> add SegmentDictionaryPartitionScheme bcbf08ab6 <Benjamin Kietzman> refactor PartitionScheme to act on individual segments df5d2a341 <Benjamin Kietzman> consolidate PathTree::Visit overloads 1e1885418 <Benjamin Kietzman> refactor PathTree to use fewer small allocations 580679cd0 <Benjamin Kietzman> ARROW-6965: Expose partition keys as columns Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: François Saint-Jacques <fsaintjacques@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>