Skip to content

pdf cleanup#541

Open
andiwand wants to merge 1 commit into
mainfrom
pdf-cleanup
Open

pdf cleanup#541
andiwand wants to merge 1 commit into
mainfrom
pdf-cleanup

Conversation

@andiwand

Copy link
Copy Markdown
Member
  • move stateless pdf parsing functions from DocumentParser to FileParser
  • improve Dictionary interface
  • stream version for read_xref_stream_table
  • upstream decrypt_strings to Decryptor
  • make some classes final

@andiwand andiwand marked this pull request as ready for review June 17, 2026 20:51
@andiwand andiwand enabled auto-merge (squash) June 17, 2026 20:51

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af7d826733

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +947 to +948
std::istringstream in(std::move(data));
Xref xref = FileParser(in).read_xref_stream_table(field_widths, subsections);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse xref streams from the decoded buffer

For PDFs whose cross-reference stream is filtered (for example the common /Filter /FlateDecode case), decode(...) puts the usable entry table in decoded.data, but this new stream is built from data, which was already moved into decode and is still the encoded payload at best. That makes xref-stream PDFs parse garbage or immediately fall into recovery, and recovery loses trailer-only data such as /Encrypt from xref-stream dictionaries, so encrypted modern PDFs can no longer open correctly.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant