Skip to content

feat(eval): conversation tracking and --preserve-failed flag#1672

Open
macekond wants to merge 1 commit into
gooddata:masterfrom
macekond:feat/eval-conversation-tracking
Open

feat(eval): conversation tracking and --preserve-failed flag#1672
macekond wants to merge 1 commit into
gooddata:masterfrom
macekond:feat/eval-conversation-tracking

Conversation

@macekond

Copy link
Copy Markdown

Summary

  • Add conversation_id and response_id fields to ChatResult for tracing eval conversations back to the GoodData server
  • Capture responseId from SSE event data during stream parsing
  • Attach conversation_id to exceptions so error reports include it for debugging
  • Add --preserve-failed CLI flag to keep failed conversations on the server for post-mortem inspection
  • Add preserve_failed to RunConfig and pass through to ChatClient

Test plan

  • Unit tests for response_id capture from SSE events (3 tests: top-level, item-level, first-wins)
  • Unit tests for conversation_id attachment to ChatResult on success
  • Unit tests for --preserve-failed behavior (keeps failed, deletes on success, deletes without flag)
  • Unit test for conversation_id on exceptions
  • Unit tests for error report including/omitting conversation_id
  • Unit test for --preserve-failed CLI flag parsing and passthrough
  • All existing tests still pass (3 pre-existing failures unrelated to this PR)

🤖 Generated with Claude Code

Add conversation_id and response_id fields to ChatResult for tracing
eval conversations back to the server. Failed conversations can be
preserved for post-mortem inspection via --preserve-failed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@macekond macekond requested review from hkad98, lupko and pcerny as code owners June 23, 2026 12:43
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.21%. Comparing base (04bfbc5) to head (386cef3).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1672   +/-   ##
=======================================
  Coverage   79.21%   79.21%           
=======================================
  Files         232      232           
  Lines       15809    15809           
=======================================
  Hits        12523    12523           
  Misses       3286     3286           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant