[perf] Make ``PrettyPrinter`` format lazily so output can be budget-capped by Pierre-Sassoulas · Pull Request #14588 · pytest-dev/pytest

Pierre-Sassoulas · 2026-06-13T16:24:01Z

Refactor required prior to #14523.

_format and the per-type helpers now yield their output as a stream of string chunks instead of writing to a file-like object, and pformat joins them. On top of that, pformat_lines pulls from the formatter only until a budget is reached:

pformat_lines(obj, max_lines=None, max_chars=None)

It stops on the first chunk that reaches either budget, so a huge collection costs O(budget) rather than O(N). Either dimension may be None (unbounded); with both None the whole object is formatted.

Benchmark (PrettyPrinter alone, width 80)::

list(range(500_000)):
    pformat().splitlines()        ~805 ms
    pformat_lines(max_lines=11)   ~0.027 ms      (~30000x)

[8 small ints] (common small diff):
    pformat().splitlines()        ~0.0133 ms
    pformat_lines(max_lines=11)   ~0.0163 ms (+3µs)

["x"*100_000] * 3 (flat, few huge elements):
    pformat_lines(max_chars=640)  stops after ~100_000 chars
                                  (one element) instead of 300_000

…apped ``_format`` and the per-type helpers now ``yield`` their output as a stream of string chunks instead of writing to a file-like object, and ``pformat`` joins them. On top of that, ``pformat_lines`` pulls from the formatter only until a budget is reached: pformat_lines(obj, max_lines=None, max_chars=None) It stops on the first chunk that reaches *either* budget, so a huge collection costs O(budget) rather than O(N). Either dimension may be ``None`` (unbounded); with both ``None`` the whole object is formatted. Motivation ---------- Assertion diffs are truncated to a handful of lines/chars before being shown. Formatting the whole of a large ``==`` comparison and then throwing almost all of it away is pure waste. With a lazy formatter the truncating caller simply stops pulling once it has enough. Benchmark (``PrettyPrinter`` alone, width 80):: list(range(500_000)): pformat().splitlines() ~805 ms pformat_lines(max_lines=11) ~0.027 ms (~30000x) [8 small ints] (common small diff): pformat().splitlines() ~0.0133 ms pformat_lines(max_lines=11) ~0.0185 ms (+~5 us) ["x"*100_000] * 3 (flat, few huge elements): pformat_lines(max_chars=640) stops after ~100_000 chars (one element) instead of 300_000 Why a lazy generator rather than a fast path + budget stream ------------------------------------------------------------ An earlier approach kept a cheap ``pformat().splitlines()`` fast path guarded by ``len(obj) <= max_lines`` plus a flatness check, falling back to a write-intercepting budget-stream class for the rest. Two problems: * ``len(obj)`` is only a *lower* bound on the line count — one nested element (``[{...50 keys...}]``) expands to many lines — so the guard needed the flatness scan to stay correct, and even then it bounded only *lines*, never *chars*: a flat container of a few enormous strings has almost no lines but blows the char budget. * it was two code paths plus a stream class plus an exception used for control flow. Because the formatter is lazy, "stop pulling at the budget" is the whole optimisation: correct regardless of how lines/chars are distributed across elements, bounding both dimensions, with no ``len()`` proxy to get wrong and no fast/slow branch. The common small-diff case costs only ~5 us more than the unbounded path (it is never the bottleneck — a failing assertion isn't hot), while large comparisons drop by orders of magnitude. ``_pprint_set``/``_pprint_dict`` also try a plain ``sorted`` first and fall back to the ``_safe_key`` wrapper only for unorderable mixes. This diverges structurally from the upstream cpython ``pprint`` it was vendored from; the module header notes it is no longer kept in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

In ``pformat_lines``'s budget loop, ``chunk.count("\n")`` ran on every chunk, but most chunks (brackets, indentation, item reprs) contain no newline. Guarding the call with ``"\n" in chunk`` skips it on those and recovers part of the per-chunk budget-tracking overhead: formatting an 8-element list under a budget drops from ~0.0185 ms to ~0.0163 ms (versus ~0.0132 ms for an uncapped ``pformat().splitlines()``, so the budget overhead roughly halves, from ~+5 us to ~+3 us). The win is small and only matters on the ``-v`` truncating path of a failing assertion (the default path doesn't format the diff at all), so this is kept as a separate commit — easy to drop if the extra branch isn't judged worth it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Pierre-Sassoulas added the skip news used on prs to opt out of the changelog requirement label Jun 13, 2026

Pierre-Sassoulas force-pushed the pprint-lazy-budget branch from 35ed5b2 to d4b901c Compare June 13, 2026 16:30

Pierre-Sassoulas marked this pull request as draft June 13, 2026 16:30

Pierre-Sassoulas force-pushed the pprint-lazy-budget branch from d4b901c to 133da41 Compare June 13, 2026 16:37

Pierre-Sassoulas force-pushed the pprint-lazy-budget branch from 133da41 to f4bd109 Compare June 13, 2026 17:12

Merge branch 'main' into pprint-lazy-budget

be30eff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf] Make `PrettyPrinter` format lazily so output can be budget-capped#14588

[perf] Make `PrettyPrinter` format lazily so output can be budget-capped#14588
Pierre-Sassoulas wants to merge 3 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:pprint-lazy-budget

Pierre-Sassoulas commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Pierre-Sassoulas commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pierre-Sassoulas commented Jun 13, 2026 •

edited

Loading