Add AssemblyAI universal-3-5-pro with conversation-context carryover#155
Add AssemblyAI universal-3-5-pro with conversation-context carryover#155dlange-aai wants to merge 4 commits into
Conversation
Upgrade pipecat-ai to >=1.4.0, which adds the universal-3-5-pro streaming model (Universal-3 Pro family) and AssemblyAI conversation-context carryover (`agent_context` Settings seed + `AssemblyAISTTService.update_agent_context()`). In a standard pipecat bot, carryover is automatic: the assistant context aggregator emits `LLMContextAssistantTurnFrame`, the upstream STT picks it up via `_process_assistant_turn()`, and the AssemblyAI override forwards it to `update_agent_context()`. EVA's cascade pipeline drives the agent turn through a custom `BenchmarkAgentProcessor` that pushes `TTSSpeakFrame` directly and never emits the standard LLM response frames, so the aggregation is empty and that frame is not produced. We therefore trigger carryover explicitly. - services.py: add `update_stt_agent_context()` helper — forwards the agent's reply to STT when it exposes `update_agent_context` (AssemblyAI U3 Pro), no-op otherwise. The existing Settings-forwarding already passes `model`, `agent_context`, and `previous_context_n_turns` (opt-out) through from config. - pipecat_server.py: call the helper from the cascade `on_assistant_response` hook so each agent reply seeds STT before the user's next turn. Calling it alongside the auto-path is idempotent (update_agent_context replaces). - .env.example: document the AssemblyAI universal-3-5-pro config + carryover. - tests: AssemblyAI tests use universal-3-5-pro, assert carryover Settings forwarding, and cover the helper (forward / no-op / empty / None). Verified: full unit suite passes on pipecat 1.4.0 (1765 passed); universal-3-5-pro is recognized as a U3 Pro model (U3_PRO_MODEL_PREFIXES) so carryover applies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vad_force_turn_endpoint is an AssemblyAISTTService constructor arg, not a Settings field, so the dataclass-field forwarding in create_stt_service() does not carry it and it was stuck at the pipecat default. Thread it explicitly from EVA_MODEL__STT_PARAMS (default True = Pipecat-mode: force the endpoint on Silero VAD stop; False lets AssemblyAI's server-side min/max_turn_silence decide). The Settings-level tuning fields (vad_threshold, min_turn_silence, max_turn_silence) already forward via the existing dataclass introspection. - .env.example: document the tuned AssemblyAI example (vad_threshold=0.1, min_turn_silence=100, max_turn_silence=100, vad_force_turn_endpoint=true). - tests: assert vad_force_turn_endpoint defaults True and is overridable, and that vad_threshold/min_turn_silence/max_turn_silence forward. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JosephMarinier
left a comment
There was a problem hiding this comment.
Very nice! I ran a few test evals with context carryover and it worked well. Thank you very much!
| #i so the agent's last reply improves transcription of the user's next turn; set "previous_context_n_turns": 0 to disable). | ||
| #i Tuning fields (vad_threshold, min_turn_silence, max_turn_silence) forward to AssemblyAISTTService.Settings; | ||
| #i vad_force_turn_endpoint (default true = Pipecat forces the endpoint on VAD stop) is a constructor arg: | ||
| #i EVA_MODEL__STT_PARAMS='{"api_key": "your_assemblyai_api_key", "model": "universal-3-5-pro", "vad_threshold": 0.1, "min_turn_silence": 100, "max_turn_silence": 100, "vad_force_turn_endpoint": true}' |
There was a problem hiding this comment.
With "vad_force_turn_endpoint": true (the default), if we set max_turn_silence like this, we get this warning:
Your max_turn_silence value (100ms) will be OVERRIDDEN in Pipecat mode (vad_force_turn_endpoint=True). It will be set to 100ms (matching min_turn_silence) and SENT to AssemblyAI to avoid double turn detection. To use your max_turn_silence as-is, switch to AssemblyAI turn detection mode (vad_force_turn_endpoint=False).
So, should we clean that up?
| #i EVA_MODEL__STT_PARAMS='{"api_key": "your_assemblyai_api_key", "model": "universal-3-5-pro", "vad_threshold": 0.1, "min_turn_silence": 100, "max_turn_silence": 100, "vad_force_turn_endpoint": true}' | |
| #i EVA_MODEL__STT_PARAMS='{"api_key": "your_assemblyai_api_key", "model": "universal-3-5-pro", "vad_threshold": 0.1, "min_turn_silence": 100, "vad_force_turn_endpoint": true}' |
| "vad_threshold": 0.1, | ||
| "min_turn_silence": 120, | ||
| "min_turn_silence": 100, | ||
| "max_turn_silence": 100, |
There was a problem hiding this comment.
See my other comment first. Then, to be consistent, should we clean that up too?
| "max_turn_silence": 100, |
We can leave the assert svc._settings.max_turn_silence == 100; it works. That said, I'm not sure if that's something EVA needs to test (as opposed to a unit test in Pipecat). I'll leave it up to you.
What & why
Upgrades
pipecat-aito>=1.4.0and adds first-class support for AssemblyAI'suniversal-3-5-prostreaming STT model (the Universal-3 Pro family) plus pipecat 1.4.0's new conversation-context carryover, so the agent's most recent reply seeds the STT before the user's next turn — improving transcription of short answers, spelled-out entities (codes/emails/IDs), and disambiguation.Changes
pyproject.toml/uv.lock:pipecat-ai>=1.0.0→>=1.4.0(one new transitive dep,pyyaml-include; no other churn).services.py:update_stt_agent_context(stt, text)helper — forwards the agent's reply toAssemblyAISTTService.update_agent_context()when the STT exposes it (AssemblyAI U3 Pro), no-op otherwise.vad_force_turn_endpointthroughEVA_MODEL__STT_PARAMS— it's a constructor arg (not aSettingsfield), so the existingdataclasses.fields(...)forwarding didn't carry it. DefaultTrue(Pipecat-mode).Settingsfields (agent_context,previous_context_n_turns) already flow through the existing dataclass forwarding — no extra code.pipecat_server.py: call the helper from the cascadeon_assistant_responsehook so each agent reply seeds STT context..env.example: documented AssemblyAIuniversal-3-5-proexample with carryover + tuning fields.universal-3-5-pro; cover carryover-Settings forwarding, theupdate_stt_agent_contexthelper (forward / no-op-absent / empty / None), andvad_force_turn_endpointdefault + override.Why explicit carryover (not pipecat's automatic path)
In a standard pipecat bot, carryover fires automatically: the assistant context aggregator emits
LLMContextAssistantTurnFrame, the upstream STT picks it up via_process_assistant_turn(), and the AssemblyAI override callsupdate_agent_context(). EVA's cascade pipeline drives the agent turn through a customBenchmarkAgentProcessorthat pushesTTSSpeakFramedirectly and does not emit the standard LLM response frames, so that aggregation is empty and the frame is never produced. We therefore trigger the update explicitly from the existing assistant-response hook. The call is idempotent with the auto-path (update_agent_contextreplaces rather than accumulates), anduniversal-3-5-prois recognized by pipecat'sU3_PRO_MODEL_PREFIXES, so it gets the full feature set.Verification
universal-3-5-pro+ Cartesiasonic-3: conversations complete, and carryover is confirmed firing on the live run (AssemblyAI's_clip_agent_contextlogs the agent reply being sent each turn).Notes