Skip to content

Add test for UpdateStatistics chaining set and remove#3557

Open
nmshr wants to merge 3 commits into
apache:mainfrom
nmshr:test_update_statistics_chains_set_and_remove
Open

Add test for UpdateStatistics chaining set and remove#3557
nmshr wants to merge 3 commits into
apache:mainfrom
nmshr:test_update_statistics_chains_set_and_remove

Conversation

@nmshr

@nmshr nmshr commented Jun 26, 2026

Copy link
Copy Markdown

Context

The intention of this PR is to shed light on a potential issue.
It is meant as an inquiry to understand what the maintainers feel about this scenario.
This is perhaps an edge case, so feel free to prioritise it in accordance.

Observation

When executing update_statistics on a table, there are two methods which can be chained by design: set_statistics and remove_statistics.

set_statistics appends to _updates using += (line 56).
remove_statistics replaces _updates using = (line 65).

When chaining both, remove_statistics drops all preceding set_statistics calls. This may lead to the query engine scanning more data in the absence of the required statistics files.

See pyiceberg/table/update/statistics.py, lines 56 and 65.

AI Use Disclaimer

I used claude code to understand test coverage and discover improvement opportunities. I used
claude code to verify that the test adds value and is not going to be a waste of time for the reviwers. I did not use claude code to write any code, commit messages or descriptions, all writing, including this, is my own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant