Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions ftfy/badness.py
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,12 @@
|
^[ÃÂ][ ]
|
# Upper-accented letter followed by a currency symbol at the very
# start of the string (otherwise usually requires a preceding space).
# Require a word character after the pair so the pattern does not match
# the isolated 2-character substring inside decode_inconsistent_utf8.
^[{upper_accented}][{currency}]\w
|

# Cases where  precedes a character as an encoding of exactly the same
# character, and the character is common enough
Expand Down
7 changes: 7 additions & 0 deletions tests/test-cases/synthetic.json
Original file line number Diff line number Diff line change
Expand Up @@ -204,5 +204,12 @@
"original": "OÙ ET QUAND?",
"fixed": "OÙ ET QUAND?",
"expect": "pass"
},
{
"label": "Synthetic: mojibake at the beginning of a string (Ã¥ for å)",
"comment": "issue #222: å mojibake not detected at the very start of a string. The badness heuristic missed Ã¥ when a currency symbol followed an upper-accented letter without a preceding space.",
"original": "Ã¥klagarmyndighets",
"fixed": "åklagarmyndighets",
"expect": "pass"
}
]