Skip to content

fix: encode() and decode() read vocab arrays in wrong direction#7

Merged
CodeWithKyrian merged 1 commit into
mainfrom
fix/fallback-vocab-direction
Jun 17, 2026
Merged

fix: encode() and decode() read vocab arrays in wrong direction#7
CodeWithKyrian merged 1 commit into
mainfrom
fix/fallback-vocab-direction

Conversation

@CodeWithKyrian

Copy link
Copy Markdown
Owner

This PR fixes a bug where FallbackModel::encode() and FallbackModel::decode() read from the wrong internal arrays, causing every lookup to miss and fall back to the unknown-token placeholder.

Motivation and context

The FallbackModel stores two vocabularies: a forward mapping from tokens to IDs, and a reverse mapping from IDs back to tokens. The class had these two arrays stored under variable names that suggested the opposite direction — so encode() was reading from the reverse map and decode() was reading from the forward map. For CTC tokenizers like Wav2Vec2 where token strings are purely alphabetic characters (e.g. 'e' → 5) and IDs are integers, both lookups silently returned nothing and fell back to <pad> or an empty string. This produced completely silent ASR output for every Wav2Vec2 pipeline.

What's changed

  • Fixed encode() to read from the token→ID mapping instead of the reverse map
  • Fixed decode() to read from the ID→token mapping instead of the forward map

Breaking changes

None.

@CodeWithKyrian CodeWithKyrian force-pushed the fix/fallback-vocab-direction branch 2 times, most recently from 5fa04e2 to d807539 Compare June 17, 2026 20:54
@CodeWithKyrian CodeWithKyrian force-pushed the fix/fallback-vocab-direction branch from d807539 to 82a3ce4 Compare June 17, 2026 20:58
@CodeWithKyrian CodeWithKyrian merged commit fb2b30d into main Jun 17, 2026
5 checks passed
@CodeWithKyrian CodeWithKyrian deleted the fix/fallback-vocab-direction branch June 17, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant