You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While processing pybind modules, stubgen inspects the docstring in order to determine the possible function signatures (there may be many if the function is overloaded). During inspection, tokenize.tokenize is invoked and TokenError is suppressed:
However, some tokenization errors prevent detection of the function signature. For example, having an unterminated string literal in the docstring is valid (in the context of a Python docstring), but will cause tokenization to stop.
To Reproduce
The following docstring should trigger this behavior:
defthing():
""" thing(*args, **kwargs) Overloaded function. 1. thing(x: int) -> None This is a valid docstring. "We do not need to terminate this string literal on this line. 2. thing(x: int, y: int) -> str This signature will never get parsed due to TokenError. """
The example above terminates with unterminated string literal before overload 2 is reached, resulting in a missing signature. The resulting signatures produced by infer_sig_from_docstring are:
Alternatively, a math RST block in the docstring will also cause this behavior.
defthing():
""" thing(*args, **kwargs) Overloaded function. 1. thing(x: int) -> None .. math:: \mathbf{x} = 3 \cdot \mathbf{y} 2. thing(x: int, y: int) -> str This signature will never get parsed due to TokenError. """
The second signature is never parsed due to unexpected character after line continuation character.
Expected Behavior
Ideally all signatures would be detected. It is understandable that it fails, since the scope of things that can appear in a docstring is fairly arbitrary.
Actual Behavior
The first signature will be extracted, but subsequent signatures are not detected. My guess is that this happens because of the tokenization error produced by the first docstring in the list of overloads.
Your Environment
Mypy version used: 1.14
Mypy command-line flags: --package --output
Mypy configuration options from mypy.ini (and other config files): None
Python version used: 3.12
Possible Fix
The following might be a viable fix. I tried changing the logic to resume tokenization after errors (provided there is data remaining):
# Keep tokenizing after an error. If `TokenError` is enountered, tokenize() will# stop. We check the remaining bytes in bytes_io and resume tokenizing on the next# loop iteration.encoded_docstr=docstr.encode("utf-8")
bytes_io=io.BytesIO(encoded_docstr)
whilebytes_io.tell() <len(encoded_docstr):
# Return all found signatures, even if there is a parse error after some are found.withcontextlib.suppress(tokenize.TokenError):
try:
tokens=tokenize.tokenize(bytes_io.readline)
fortokenintokens:
state.add_token(token)
exceptIndentationError:
returnNone
On both of my examples above, this produces the correct # of signatures. infer_sig_from_docstring returns:
Bug Report
While processing pybind modules, stubgen inspects the docstring in order to determine the possible function signatures (there may be many if the function is overloaded). During inspection,
tokenize.tokenize
is invoked andTokenError
is suppressed:mypy/mypy/stubdoc.py
Lines 349 to 358 in 55d4c17
However, some tokenization errors prevent detection of the function signature. For example, having an unterminated string literal in the docstring is valid (in the context of a Python docstring), but will cause tokenization to stop.
To Reproduce
The following docstring should trigger this behavior:
The example above terminates with
unterminated string literal
before overload2
is reached, resulting in a missing signature. The resulting signatures produced byinfer_sig_from_docstring
are:Alternatively, a
math
RST block in the docstring will also cause this behavior.The second signature is never parsed due to
unexpected character after line continuation character
.Expected Behavior
Ideally all signatures would be detected. It is understandable that it fails, since the scope of things that can appear in a docstring is fairly arbitrary.
Actual Behavior
The first signature will be extracted, but subsequent signatures are not detected. My guess is that this happens because of the tokenization error produced by the first docstring in the list of overloads.
Your Environment
mypy.ini
(and other config files): NonePossible Fix
The following might be a viable fix. I tried changing the logic to resume tokenization after errors (provided there is data remaining):
On both of my examples above, this produces the correct # of signatures.
infer_sig_from_docstring
returns:If you are amenable to this solution, I can open a PR.
The text was updated successfully, but these errors were encountered: