Skip to content

File skipping based on log metadata statistics #1486

Answered by iiLaurens
iiLaurens asked this question in Q&A
Discussion options

You must be logged in to vote

Nevermind, I now notice that the partition expression for each file is enhanced with statistics that are known by delta-rs in here:

fragments = [
format.make_fragment(
file,
filesystem=filesystem,
partition_expression=part_expression,
)
for file, part_expression in self._table.dataset_partitions(
self.schema().to_pyarrow(), partitions
)
]

I have a simple test table which I partitioned on column A and Z-ordered over column B and C. The results from self._table.dataset_partitions I get:

[('A=2/part-00001-1ebb693e-4e30-4840-8896-daed52a3f0ff-c000.zstd.parquet',
  <pya…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by iiLaurens
Comment options

You must be logged in to vote
1 reply
@roeap
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants