Skip to content

Add llama.cpp / GGUF runtime for Fun-ASR-Nano (CPU / edge)#115

Merged
LauraGPT merged 5 commits into
mainfrom
feat/llama-cpp-gguf
Jun 19, 2026
Merged

Add llama.cpp / GGUF runtime for Fun-ASR-Nano (CPU / edge)#115
LauraGPT merged 5 commits into
mainfrom
feat/llama-cpp-gguf

Conversation

@LauraGPT

Copy link
Copy Markdown
Member

Fun-ASR-Nano on llama.cpp / GGUF

Run Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B) entirely on the
llama.cpp / ggml stack — CPU / edge, a single binary, no Python at runtime. Like
whisper.cpp, but for Fun-ASR.

What's added (runtime/llama.cpp/)

  • funasr-cli — integrated WAV → transcription binary
  • funasr-encoder / funasr-embd — the ggml encoder+adaptor and the LLM-from-embeds tools (validation)
  • export_encoder_gguf.py — export the audio encoder + adaptor to GGUF (f32 / f16)
  • README with architecture, quickstart, accuracy and gotchas

How it works

fbank (C++) → SAN-M encoder + adaptor (ggml) → low-frame-rate truncation →
[prefix tokens | audio embeds | suffix tokens] → Qwen3-0.6B (llama.cpp). The audio
embeddings are injected via llama_decode's embedding-input path (the llava/mtmd mechanism).

Validation (184-file benchmark)

  • Encoder + adaptor (ggml) vs PyTorch: cosine 1.000000, max_abs_diff 5e-3 (f32)
  • kaldi fbank (C++) vs torchaudio: cosine 1.000000
  • End-to-end CER, identical conditions (f32 LLM, 15s chunking): C++ micro 11.68% vs PyTorch 11.70% (Δ0.02%)
  • Best practical (Q8 LLM + 15s chunk): micro-CER 9.51%

Notes

  • Use --chunk 15 for long audio. WAV input assumes 16 kHz mono PCM16 for now.
  • Does not modify any existing code; everything is under runtime/llama.cpp/.

LauraGPT added 5 commits June 18, 2026 15:39
Run Fun-ASR-Nano (SenseVoice SAN-M encoder + adaptor + Qwen3-0.6B) entirely on
the llama.cpp / ggml stack: CPU / edge, single binary, no Python at runtime.
Audio embeddings are injected into the LLM via llama_decode embedding input
(the llava/mtmd mechanism). Includes the ggml encoder forward, a GGUF export
script, and an integrated WAV->text CLI. Validated against PyTorch (encoder
cosine 1.0; aggregate CER matches within 0.02% under identical conditions).
…eak checks)

Fixes gemini-code-assist findings: validate channels/bits in the WAV reader
(reject non-16-bit/zero-channel instead of dividing by zero), guard chunk size
>=1, null-check fopen/llama_context, and fclose/ggml_free on all paths.
@LauraGPT LauraGPT merged commit 84a1ecc into main Jun 19, 2026
@LauraGPT LauraGPT deleted the feat/llama-cpp-gguf branch June 19, 2026 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant