fix(fetch): detect encoding for non-UTF-8 pages using charset-normalizer by olegsa · Pull Request #3880 · modelcontextprotocol/servers

olegsa · 2026-04-09T07:36:40Z

Summary

Add automatic character encoding detection to mcp-server-fetch using charset-normalizer for pages that don't declare charset in the HTTP Content-Type header
Introduces get_response_text() helper that checks response.charset_encoding first, then falls back to statistical byte analysis via charset-normalizer
Fixes garbled text when fetching pages served in non-UTF-8 encodings (e.g. windows-1251, windows-1255, windows-1256, euc-kr) without a charset declaration in the HTTP header

Motivation

Many websites (especially non-English ones) serve content in legacy encodings like windows-1255 (Hebrew), windows-1251 (Cyrillic), windows-1256 (Arabic), or euc-kr (Korean) without declaring the charset in the HTTP Content-Type header. The current code uses response.text which defaults to UTF-8, producing garbled/mojibake output for these pages.

Changes

server.py: Added get_response_text() that uses charset-normalizer for encoding detection when HTTP headers lack charset info. Replaced response.text with get_response_text(response) in fetch_url(). Also moved httpx from local imports to a top-level import.
pyproject.toml: Added charset-normalizer>=3.0.0 as an explicit dependency (already a transitive dep via requests).
tests/test_server.py: Added TestGetResponseText class with 5 tests covering UTF-8 passthrough, Ukrainian (windows-1251), Hebrew (windows-1255), Arabic (windows-1256), and Korean (euc-kr) encoding detection.

Test plan

All existing tests pass
New encoding detection tests pass for Ukrainian, Hebrew, Arabic, Korean
UTF-8 pages with charset in HTTP header still use the standard path
Non-HTML content (JSON, etc.) is unaffected

Made with Cursor

…d refactor HTTP client usage

…tests Add charset-normalizer as an explicit dependency in pyproject.toml and add tests verifying correct decoding of non-UTF-8 pages (Ukrainian windows-1251, Hebrew windows-1255, Arabic windows-1256, Korean euc-kr). Made-with: Cursor

Made-with: Cursor

olegsa added 4 commits April 9, 2026 10:32

feat(fetch): enhance HTTP response handling with charset detection an…

578c61a

…d refactor HTTP client usage

chore(fetch): regenerate uv.lock after adding charset-normalizer

50995f8

Made-with: Cursor

Update revision number in uv.lock file

a754e89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fetch): detect encoding for non-UTF-8 pages using charset-normalizer#3880

fix(fetch): detect encoding for non-UTF-8 pages using charset-normalizer#3880
olegsa wants to merge 4 commits intomodelcontextprotocol:mainfrom
olegsa:fix/fetch-encoding-detection

olegsa commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

olegsa commented Apr 9, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant