Skip to content

gh-150771: Fix email serialization for shift_jis and euc-jp#151120

Open
bhuvi27 wants to merge 6 commits into
python:mainfrom
bhuvi27:gh-150771-fix-email-shift-jis-euc-jp
Open

gh-150771: Fix email serialization for shift_jis and euc-jp#151120
bhuvi27 wants to merge 6 commits into
python:mainfrom
bhuvi27:gh-150771-fix-email-shift-jis-euc-jp

Conversation

@bhuvi27

@bhuvi27 bhuvi27 commented Jun 9, 2026

Copy link
Copy Markdown

Fixes #150771

Creating a message with set_content(..., charset='shift_jis') or charset='euc-jp' raised UnicodeEncodeError on str(m) because the payload was encoded with the input charset while the Content-Type uses the output
charset (iso-2022-jp).

Use Charset.output_charset in set_text_content so the payload and Content-Type agree from the start.

@bedevere-app

bedevere-app Bot commented Jun 9, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bhuvi27 bhuvi27 force-pushed the gh-150771-fix-email-shift-jis-euc-jp branch from 5928e5b to f071b32 Compare June 9, 2026 02:48
Convert surrogate-escaped payloads through the input charset before
encoding to iso-2022-jp, fixing UnicodeEncodeError when printing
messages created with set_content().
@bhuvi27 bhuvi27 force-pushed the gh-150771-fix-email-shift-jis-euc-jp branch from f071b32 to 6795f58 Compare June 9, 2026 02:49
bhuvi27 added 5 commits June 10, 2026 08:44
…/euc-jp

Encode the payload with the charset output mapping (iso-2022-jp) when
set_content is called with shift_jis or euc-jp, instead of patching
serialization in body_encode and set_payload. Reverts those changes.
Use plain backticks for set_content() instead of a broken :func: target.

@serhiy-storchaka serhiy-storchaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

Do you think an assertion for bytes(m) would be useful?

self.assertEqual(m['Content-Type'], 'text/plain; charset="iso-2022-jp"')
self.assertEqual(m.get_payload(decode=True), content.encode('iso-2022-jp'))
self.assertEqual(m.get_content(), content)
self.assertEqual(str(m), textwrap.dedent("""\

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add also assertions for bytes(m) similar to test_set_text_charset_cp949.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with Shift_JIS and EUC-JP in email

2 participants