Skip to content

fix: wrap corrupt/truncated/encrypted zip read errors in PackageNotFoundError#1562

Open
gaoflow wants to merge 1 commit into
python-openxml:masterfrom
gaoflow:fix/1561-wrap-zip-read-exceptions
Open

fix: wrap corrupt/truncated/encrypted zip read errors in PackageNotFoundError#1562
gaoflow wants to merge 1 commit into
python-openxml:masterfrom
gaoflow:fix/1561-wrap-zip-read-exceptions

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 24, 2026

Copy link
Copy Markdown

Summary

When docx.Document(stream) parses a malformed, corrupt, or unexpectedly encrypted .docx file, low-level exceptions from Python's zipfile / zlib modules escape the library boundary instead of being surfaced as a consistent python-docx exception. Callers are left catching zlib.error, EOFError, RuntimeError, or BadZipFile — unrelated internal types they should never need to know about.

This PR fixes the two callsites in _ZipPkgReader:

  • __init__: ZipFile(pkg_file, "r") can raise BadZipFile when the stream is not a valid zip. This is now wrapped in PackageNotFoundError.
  • blob_for: ZipFile.read() can raise BadZipFile (CRC mismatch), zlib.error (corrupt compressed data), EOFError (truncated stream), or RuntimeError (encrypted entry without a password). All four are now wrapped in PackageNotFoundError.

Callers already catch PackageNotFoundError; no API change is required on their side.

Closes #1561.

Test plan

  • tests/opc/test_phys_pkg.py::DescribeZipPkgReader::it_raises_PackageNotFoundError_when_stream_is_not_a_zip — new
  • tests/opc/test_phys_pkg.py::DescribeZipPkgReader::it_raises_PackageNotFoundError_when_blob_has_bad_crc — new (BadZipFile CRC path)
  • tests/opc/test_phys_pkg.py::DescribeZipPkgReader::it_raises_PackageNotFoundError_when_zlib_data_is_corrupt — new (zlib.error path)
  • Full test suite: pytest tests/ — 1612 passed (1609 existing + 3 new)

This PR was developed with AI assistance (Claude Sonnet 4.5, Anthropic) under the contributor's direction and review.

When parsing a corrupt, truncated, or unexpectedly encrypted .docx file,
ZipFile.read() can raise zlib.error, EOFError, RuntimeError, or
BadZipFile. These native exceptions previously escaped from
Document(stream), requiring callers to catch multiple unrelated types.

_ZipPkgReader.__init__ now wraps BadZipFile on construction, and
blob_for() wraps BadZipFile, zlib.error, EOFError, and RuntimeError,
re-raising them all as PackageNotFoundError so callers have a single
consistent docx-level exception to handle.

Fixes python-openxml#1561.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unhandled exceptions (zlib.error, EOFError, RuntimeError) during malicious/corrupt DOCX parsing

1 participant