Add llama.cpp / GGUF runtime for Fun-ASR-Nano (CPU / edge) by LauraGPT · Pull Request #115 · FunAudioLLM/Fun-ASR

LauraGPT · 2026-06-18T16:18:50Z

Fun-ASR-Nano on llama.cpp / GGUF

Run Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B) entirely on the
llama.cpp / ggml stack — CPU / edge, a single binary, no Python at runtime. Like
whisper.cpp, but for Fun-ASR.

What's added (`runtime/llama.cpp/`)

funasr-cli — integrated WAV → transcription binary
funasr-encoder / funasr-embd — the ggml encoder+adaptor and the LLM-from-embeds tools (validation)
export_encoder_gguf.py — export the audio encoder + adaptor to GGUF (f32 / f16)
README with architecture, quickstart, accuracy and gotchas

How it works

fbank (C++) → SAN-M encoder + adaptor (ggml) → low-frame-rate truncation →
[prefix tokens | audio embeds | suffix tokens] → Qwen3-0.6B (llama.cpp). The audio
embeddings are injected via llama_decode's embedding-input path (the llava/mtmd mechanism).

Validation (184-file benchmark)

Encoder + adaptor (ggml) vs PyTorch: cosine 1.000000, max_abs_diff 5e-3 (f32)
kaldi fbank (C++) vs torchaudio: cosine 1.000000
End-to-end CER, identical conditions (f32 LLM, 15s chunking): C++ micro 11.68% vs PyTorch 11.70% (Δ0.02%)
Best practical (Q8 LLM + 15s chunk): micro-CER 9.51%

Notes

Use --chunk 15 for long audio. WAV input assumes 16 kHz mono PCM16 for now.
Does not modify any existing code; everything is under runtime/llama.cpp/.

Run Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B) entirely on the llama.cpp / ggml stack: CPU / edge, single binary, no Python at runtime. Audio embeddings are injected into the LLM via llama_decode embedding input (the llava/mtmd mechanism). Includes the ggml encoder forward, a GGUF export script, and an integrated WAV->text CLI. Validated against PyTorch (encoder cosine 1.0; aggregate CER matches within 0.02% under identical conditions).

…tart, accuracy, gotchas)

…alidation, gotchas)

…eak checks) Fixes gemini-code-assist findings: validate channels/bits in the WAV reader (reject non-16-bit/zero-channel instead of dividing by zero), guard chunk size >=1, null-check fopen/llama_context, and fclose/ggml_free on all paths.

LauraGPT added 5 commits June 18, 2026 15:39

docs: expand Fun-ASR-Nano llama.cpp README (why, architecture, quicks…

917b163

…tart, accuracy, gotchas)

docs: add comprehensive design document (architecture, GGUF format, v…

146ff2f

…alidation, gotchas)

docs: link README to DESIGN.md

9eeb192

LauraGPT merged commit 84a1ecc into main Jun 19, 2026

LauraGPT deleted the feat/llama-cpp-gguf branch June 19, 2026 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama.cpp / GGUF runtime for Fun-ASR-Nano (CPU / edge)#115

Add llama.cpp / GGUF runtime for Fun-ASR-Nano (CPU / edge)#115
LauraGPT merged 5 commits into
mainfrom
feat/llama-cpp-gguf

LauraGPT commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LauraGPT commented Jun 18, 2026

Fun-ASR-Nano on llama.cpp / GGUF

What's added (runtime/llama.cpp/)

How it works

Validation (184-file benchmark)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What's added (`runtime/llama.cpp/`)