Skip to content

fix(eval): thread-safe progress callbacks for --concurrency#1671

Open
macekond wants to merge 1 commit into
gooddata:masterfrom
macekond:fix/eval-concurrency-deadlock
Open

fix(eval): thread-safe progress callbacks for --concurrency#1671
macekond wants to merge 1 commit into
gooddata:masterfrom
macekond:fix/eval-concurrency-deadlock

Conversation

@macekond

Copy link
Copy Markdown

Summary

  • Wrap Rich Console.print() calls in _make_progress_callbacks with threading.Lock
  • Fixes deadlock when using --concurrency 2+ with piped stdout (e.g. background processes)
  • Add test_progress_callbacks_thread_safe test

Root Cause

_make_progress_callbacks creates closures that call console.print(). When run_items dispatches to ThreadPoolExecutor, multiple threads invoke these callbacks simultaneously. Rich Console is not safe for concurrent writes to piped stdout/stderr, causing a deadlock where the process hangs at 0% CPU with no output.

The lock serializes only the console.print() calls — actual evaluation work (AI chat, scoring) remains fully concurrent.

Reproduction

# This hangs with concurrency on large datasets:
gd-eval run --concurrency 4 --dataset large_dataset/ --model gpt-5.2 &

# Workaround before this fix:
gd-eval run --concurrency 4 --quiet --dataset large_dataset/ --model gpt-5.2 &

Test plan

  • test_progress_callbacks_thread_safe — 8 concurrent threads calling all 3 callbacks 50 times
  • Verified --concurrency 4 works on 10-item dataset with progress output
  • Verified --quiet mode still works (callbacks are None, no lock needed)
  • Verified sequential mode (concurrency=1) unaffected

🤖 Generated with Claude Code

@macekond macekond requested review from hkad98, lupko and pcerny as code owners June 23, 2026 12:17
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.21%. Comparing base (04bfbc5) to head (b526366).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1671   +/-   ##
=======================================
  Coverage   79.21%   79.21%           
=======================================
  Files         232      232           
  Lines       15809    15809           
=======================================
  Hits        12523    12523           
  Misses       3286     3286           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@macekond macekond force-pushed the fix/eval-concurrency-deadlock branch from ad3d548 to 3f3f238 Compare June 23, 2026 12:35
Wrap Rich Console.print() calls in _make_progress_callbacks with
a threading.Lock. Without this, concurrent ThreadPoolExecutor
workers deadlock when stdout is piped (e.g. background process).

The lock serializes only the print calls — actual evaluation work
remains fully concurrent.

Add test_progress_callbacks_thread_safe to verify callbacks can be
called from 8 concurrent threads without errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@macekond macekond force-pushed the fix/eval-concurrency-deadlock branch from 3f3f238 to b526366 Compare June 23, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant