Skip to content

Add method to detect if a string contains surrogates #69456

@bitdancer

Description

@bitdancer
Member
BPO 25269
Nosy @vstinner, @ezio-melotti, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-09-29.12:48:03.328>
labels = ['interpreter-core', 'type-feature', 'expert-unicode']
title = 'Add method to detect if a string contains surrogates'
updated_at = <Date 2015-09-29.13:05:02.979>
user = 'https://github.com/bitdancer'

bugs.python.org fields:

activity = <Date 2015-09-29.13:05:02.979>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Unicode']
creation = <Date 2015-09-29.12:48:03.328>
creator = 'r.david.murray'
dependencies = []
files = []
hgrepos = []
issue_num = 25269
keywords = []
message_count = 1.0
messages = ['251853']
nosy_count = 3.0
nosy_names = ['vstinner', 'ezio.melotti', 'r.david.murray']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue25269'
versions = ['Python 3.6']

Linked PRs

Activity

bitdancer

bitdancer commented on Sep 29, 2015

@bitdancer
MemberAuthor

Because surrogates are in several contexts used to "smuggle" bytes through string APIs using surrogateescape, it is very useful to be able to determine if a given string contains surrogates. The email package, for example, uses different logic to handle strings that contain smuggled bytes and strings that don't when serializing a Message object. Currently it uses x.encode() and checks for an exception (we determined that for CPython this was the most efficient method to check). It would be better, I think, to have a dedicated method on str for this, among other reasons so that different python implementations could optimize it appropriately.

(Note that another aspect of dealing with surrogateescaped strings is discussed in bpo-18814.)

transferred this issue fromon Apr 10, 2022
StanFromIreland

StanFromIreland commented on Sep 3, 2025

@StanFromIreland
Member

Hello, out of curiosity, any progress on the write up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @vstinner@bitdancer@StanFromIreland

        Issue actions

          Add method to detect if a string contains surrogates · Issue #69456 · python/cpython