pdf cleanup#541
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af7d826733
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| std::istringstream in(std::move(data)); | ||
| Xref xref = FileParser(in).read_xref_stream_table(field_widths, subsections); |
There was a problem hiding this comment.
Parse xref streams from the decoded buffer
For PDFs whose cross-reference stream is filtered (for example the common /Filter /FlateDecode case), decode(...) puts the usable entry table in decoded.data, but this new stream is built from data, which was already moved into decode and is still the encoded payload at best. That makes xref-stream PDFs parse garbage or immediately fall into recovery, and recovery loses trailer-only data such as /Encrypt from xref-stream dictionaries, so encrypted modern PDFs can no longer open correctly.
Useful? React with 👍 / 👎.
DocumentParsertoFileParserDictionaryinterfaceread_xref_stream_tabledecrypt_stringstoDecryptorfinal