Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework function signatures in docstrings of overloaded functions #2621

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AWhetter
Copy link
Contributor

@AWhetter AWhetter commented Oct 26, 2020

Description

This pull request changes the docstring generation so that all function signatures are prepended to the docstring of an overloaded function, instead of the generic signature that is prepended currently.
It also includes an additional change that allows the section headings in the docstring to be disabled independently of whether or not function signatures are shown. To keep backwards compatibility, calling disable_function_signatures() will still turn off both the prepended signatures and the section headings. Therefore section headings will only be shown if both function signatures and section headings are enabled. To rephrase, section headings cannot be enabled independently of disabling function signatures.

Closes #2619

Suggested changelog entry:

* All function signatures of an overloaded function are now prepended to the function's docstring.
  To support this change, a new ``disable_section_headings()`` function has been added to the ``options`` class to allow
  the section headings in the docstring of an overloaded function to be disabled.
  `#2621 <https://github.com/pybind/pybind11/pull/2621>

@github-actions github-actions bot added the docs Docs or GitHub info label Oct 26, 2020
@YannickJadoul YannickJadoul changed the title Fix 2619 Rework function signatures in docstrings of overloaded functions Oct 26, 2020
@henryiii henryiii added this to the v2.7 milestone Nov 5, 2020
@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jan 5, 2021

Messing around with this myself, it seems Sphinx is expecting a backslash at the end of the line, if the following is still supposed to be part of the overloaded signature (which is quite ugly, in the raw docstring, but...). Apart from that, things seem to work.

@AWhetter
Copy link
Contributor Author

Nice find! I'm not sure how I missed that before. But a small end-to-end test is working as it should for me now.

which is quite ugly, in the raw docstring, but...

You're allowed to have a space before the backslash. I haven't done that here but if you think it looks better with the space there then let me know and I can change it.

@YannickJadoul
Copy link
Collaborator

@AWhetter, ah, it does allow that space? Didn't know, but that might be slightly less ugly, yes. I think it still ugly, though (and I don't fully understand why it's necessary; Sphinx could just try parsing the next line and check if it's a signature, just like it does for the first line?), but yes, I've also made a similar ad-hoc change to the pybind11 subtree in my project, now, so thanks for pointing me to this! :-)

Is this ready for review? Guess this is only scheduled for 2.7, anyway, but I could go over this after having played around with it myself :-)

@AWhetter
Copy link
Contributor Author

I've just squashed the commits back up, so this is now ready for review.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jan 19, 2021

To rephrase, section headings cannot be enabled independently of disabling function signatures.

Just a thought: as you say, this PR is making two changes: 1) it is adding the overloaded signatures to be recognized by e.g. Sphinx, and 2) it is removing these "section headings".

What about splitting this into two? I would expect that the different sections are still useful if people have different docstrings for different overloads, but that having the overloaded signatures at the start is almost always better (and so we could turn that to on-by-default, maybe)?

It's just an idea. I'm not sure I like the option-class to become too fine-grained either, actually, but...

We might want @wjakob to weight in on this as well, actually.

EDIT: Sorry, scratch all that. This is the case, already, if I look more closely at the implementation. Except that the multiple signatures aren't optional :-)

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, this is an improvement :-) I'd really like to use it, so, yeah, let's add it to the PRs to be merged for 2.7? (I really need to sleep it seems; it's already part of 2.7)

docs/advanced/misc.rst Outdated Show resolved Hide resolved
include/pybind11/pybind11.h Show resolved Hide resolved
else signatures += "\n";
if (!show_signature_headings && first_user_def) {
first_user_def = false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The braces really blow things up here. What about just this?

if (!show_signature_headings && first_user_def)
    first_user_def = false;
else
    signatures += "\n"

signatures += it->doc;
if (show_signature_headings)
    signatures += "\n";

I know braces bring a bit of safety, but in this case, they really make things heavier to read? Just a suggestion; if you insist on braces, that's also fine. At least the if is not on one line anymore ;-)

tests/test_factory_constructors.py Outdated Show resolved Hide resolved
signatures += it->doc;
if (options::show_function_signatures()) signatures += "\n";
if (show_signature_headings) {
signatures += "\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're really feeling adventurous: it seems there's always a newline attached to the end of the docstring if show_signature_headings? Not worth making the code more complicate for, but maybe this helps restructuring the logic?

If you want, I can have a look, as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code always added a new line. But if we wanted to get rid of it then I think it would make the code more complicated. I'm thinking we would have to do something like

        for (auto it = chain_start; it != nullptr; it = it->next) {
            bool appending_docstring = (it->doc && strlen(it->doc) > 0 && options::show_user_defined_docstrings());
            bool last_section = (it->next == nullptr);
            if (show_signature_headings) {
                if (index > 0) signatures += "\n";
                if (chain)
                    signatures += std::to_string(++index) + ". ";
                signatures += rec->name;
                signatures += it->signature;
                if (appending_docstring || !last_section)) signatures += "\n";
            }
            if (appending_docstring) {
                // If we're appending another docstring, and aren't printing function signatures, we
                // need to append a newline first:
                if (!show_signature_headings && first_user_def)
                     first_user_def = false;
                else
                    signatures += "\n";
                signatures += it->doc;
                if (show_signature_headings && !last_section) signatures += "\n";
            }
        }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call on whether it's worth it. I know the original code did it, but it seems rather silly. I had hoped it would make the code easier/more intuitive. :-(

@YannickJadoul
Copy link
Collaborator

Related to your comment in #2619: it's a bit of a pity that CPython's docstrings don't have the trailing \ :-/
Did you check that stubgen handles this new version with \ correctly as well?

@AWhetter
Copy link
Contributor Author

AWhetter commented Feb 4, 2021

By chance, stubgen works, but only when the signature headings are included. stubgen isn't capable of parsing the backslash separated signatures at the start of the docstring, so it ignores them, but it can parse the section headings. So the original issue I described of stubgen outputting myfunc(*args, **kwargs) ->Any no longer happens.

I still think there's value in making the section headings in a docstring optional, but I'm more worried about resolving the issue of stubgen outputting the *args, **kwargs signature.
If we do decide that making the section headings optional is worth it, I've submitted a pull request to stubgen to support that: python/mypy#10019

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Feb 4, 2021

By chance, stubgen works, but only when the signature headings are included. stubgen isn't capable of parsing the backslash separated signatures at the start of the docstring, so it ignores them, but it can parse the section headings. So the original issue I described of stubgen outputting myfunc(*args, **kwargs) ->Any no longer happens.

Ugh. I had just thrown out the headings :-(

I still think there's value in making the section headings in a docstring optional, but I'm more worried about resolving the issue of stubgen outputting the *args, **kwargs signature.

Agreed. Does it make sense to submit an issue to stubgen or Sphinx (or both, such that they communicate!) to make sure that they act the same? :-/ (i.e., with \ or without \; I don't really care either way, though without \ does look nicer to me)

@YannickJadoul
Copy link
Collaborator

Does it make sense to submit an issue to stubgen or Sphinx (or both, such that they communicate!) to make sure that they act the same? :-/ (i.e., with \ or without \; I don't really care either way, though without \ does look nicer to me)

Oh. python/mypy#10019. I should learn how to read. Sorry! (Though asking Sphinx to not require the \ also still could make sense to me, though. But it's going to be harder to implement?)

@sizmailov
Copy link
Contributor

@AWhetter Would it be less intrusive to make sphinx and stubgen support pybind11 overloading syntax?
FYI: A set of changes to stubgen is on the way: python/mypy#9903

@AWhetter
Copy link
Contributor Author

AWhetter commented Feb 6, 2021

It would be less intrusive for pybind, yes. However I'm trying to make everything agree on a convention that is also used outside of pybind, rather than have everything support pybind's way of doing it.

You're right that we should be looking for a solution that minimises intrusion though, and I think that @YannickJadoul has identified that solution; get Sphinx to support finding signatures without a backslash, stubgen would require no changes, and get pybind to put function signatures at the beginning of the docstring.

@sizmailov
Copy link
Contributor

sizmailov commented Feb 6, 2021

Ok, that seems reasonable.

What about signatures that span multiple lines? Presumably pybind11 would always produce exactly one signature per line, but we are talking about 3rd party tools too. How should it be handled? One can argue for a like-python approach: keep parsing new lines until signature is syntactically complete.

Example:

foo(x: int) -> Callable[
              Optional[Tuple[int, int]]
         ]
foo(x: str) -> Callable \ 
         [
              Optional[Tuple[str, str]]
         ]
foo() -> None

Alternatively, multiline signatures can be disallowed/unsupported, but this seems to be unnecessary restriction. What do you think?

@AWhetter
Copy link
Contributor Author

AWhetter commented Feb 6, 2021

For that to work I think we would have to change the signature parsing of Sphinx so that it matches what stubgen does where it tokenises that string and properly parses the signatures. At the moment Sphinx is doing a basic regex search on each line of the docstring. It's definitely possible.
The change I'm about to make to Sphinx is quite small, but you could submit an issue to Sphinx to see if they would consider a bigger change to the signature parsing?

@sizmailov
Copy link
Contributor

The sphinx choice of trailing backslash as indication for multiple signatures become fairly unfortunate one. It might result into a cumbersome signature parsing implementation which supports both "legacy" and "new" versions... but is doable for sure. I may have a too few clock cycles for that in next months, can't promise, sorry 😞

Probably it's not the issue that requires immediate solution, but still it's worth to think about ahead of time.

@AWhetter
Copy link
Contributor Author

AWhetter commented Apr 4, 2021

Good news! Sphinx v4 will support the function signatures separated without backslashes.
I've updated this pull request to not include separating backslashes. I've done another small end to end test using an install of Sphinx from its "master" branch and using stubgen from the latest mypy, and both work with the docstrings that pybind outputs from this pull request.
If disable_section_headings() is not set, stubgen outputs duplicate type signatures but that's a separate issue.

@henryiii henryiii modified the milestones: v2.7, v2.8 Jul 16, 2021
@henryiii henryiii modified the milestones: v2.8, v2.9 Oct 4, 2021
@JeanChristopheMorinPerso

Hello, just dropping a little message to see if there is still some interest from maintainers into merging this? I'd personally be interested in getting that to use it on a project I contribute to (https://github.com/PixarAnimationStudios/OpenTimelineIO).

Thanks!

@rwgk
Copy link
Collaborator

rwgk commented Jan 19, 2022

The force push was for a fresh git rebase master. Trying to get the attention of at least one other maintainer.

@rwgk
Copy link
Collaborator

rwgk commented Jan 19, 2022

The one CI failure is an infrastructure issue (job not even starting up).

@henryiii
Copy link
Collaborator

Function signatures are processed by mypy; does it handle these (yet)? I'm pretty sure it handles the list at the top, but not so sure about the ones with mixed signatures and descriptions.

@henryiii
Copy link
Collaborator

The signature stuff has mostly been dormant, since we want to try to keep the size down and avoid a massive list of options or making arbitrary “it looks better” changes. Ideally, in my opinion, there’s a clear forward direction - signatures should match what mypy expects. Type checking is well defined by now, and very useful. And mypy now tests pybind11, we should probably do the same, and run mypy’s stubgen on our development branch.

@rwgk
Copy link
Collaborator

rwgk commented Jan 21, 2022

Henry pointed this out elsewhere: https://github.com/python/mypy/tree/master/test-data/stubgen/pybind11_mypy_demo

Pasting here thinking it may be useful for us exploring how to evolve the docstring generation.

@AWhetter
Copy link
Contributor Author

AWhetter commented Feb 2, 2022

Function signatures are processed by mypy; does it handle these

mypy itself does not parse the docstrings. They have to be turned into a stub file first by stubgen. These stub files then get read by mypy.

The current behaviour of pybind and stubgen is as follows.

py::class_<Pet>(m, "Pet")
   .def(py::init<const std::string &, int>())
   .def("set", (void (Pet::*)(int)) &Pet::set, "Set the pet's age")
   .def("set", (void (Pet::*)(const std::string &)) &Pet::set, "Set the pet's name");

The above pybind class would have the following docstring:

>>> help(example.Pet)

class Pet(__builtin__.object)
 |  Methods defined here:
 |
 |  __init__(...)
 |      __init__(self: Pet, arg0: str, arg1: int) -> None
 |
 |  set(...)
 |      set(*args, **kwargs) -> Any
 |      Overloaded function.
 |
 |      1. set(self: Pet, arg0: int) -> None
 |
 |      Set the pet's age
 |
 |      2. set(self: Pet, arg0: str) -> None
 |
 |      Set the pet's name

Stubgen would produce a stub file that looks like the following:

from typing import overload

class Pet:
    def __init__(self: Pet, arg0: str, arg1: int) -> None: ...
    @overload
    def set(self: Pet, arg0: int) -> None: ...
    @overload
    def set(self: Pet, arg0: str) -> None: ...
    @overload
    def set(*args, **kwargs) -> Any: ...

As the examples show, the signature at the top of the docstring and the sginatures that are section headings are both picked up.

With this change, pybind will instead produce the following docstring (when configured to do so):

>>> help(example.Pet)

class Pet(__builtin__.object)
 |  Methods defined here:
 |
 |  __init__(...)
 |      __init__(self: Pet, arg0: str, arg1: int) -> None
 |
 |  set(...)
 |      set(self: Pet, arg0: int) -> None
 |      set(self: Pet, arg0: str) -> None
 |      Overloaded function.
 |
 |      1. set(self: Pet, arg0: int) -> None
 |
 |      Set the pet's age
 |
 |      2. set(self: Pet, arg0: str) -> None
 |
 |      Set the pet's name

Which stubgen would turn into:

from typing import overload

class Pet:
    def __init__(self: Pet, arg0: str, arg1: int) -> None: ...
    @overload
    def set(self: Pet, arg0: int) -> None: ...
    @overload
    def set(self: Pet, arg0: str) -> None: ...
    @overload
    def set(self: Pet, arg0: int) -> None: ...
    @overload
    def set(self: Pet, arg0: str) -> None: ...

So unfortunately it produces duplicate overload signatures and a change would need to be made to stubgen to either have it remove duplicates or have a way for it to read the docstrings from the head of the docstring only. Having an option to stubgen to have it read only header docstrings would probably be preferred because that's then inline with Sphinx's behaviour as well, but we will need to discuss with the mypy maintainers to reach a conclusion.

@AWhetter
Copy link
Contributor Author

I've resolved all conflicts and removed additional options.
To cater for the removal of the options, I've made it such that the overload signature are always prepended to the docstring, even if disable_function_signatures() has been called. This is closer to the behaviour I'm looking for personally where I don't want the function signatures, partly because they still produce duplicate stubs with stubgen and partly because I want to be able to format the docstring myself, but I still want Sphinx and stubgen to have access to the overload signatures.
The downside of doing it this way is that users no longer have the option to write the signatures themselves.

@AWhetter
Copy link
Contributor Author

Is there anything that I can do to help progress this change?

@AWhetter
Copy link
Contributor Author

Should I address the conflicts? Is there anything else that I should do with this change to help move it forwards?

@rwgk
Copy link
Collaborator

rwgk commented Aug 11, 2024

Should I address the conflicts?

Yes please.

Is there anything else that I should do with this change to help move it forwards?

I'll look again later tonight (assuming the conflicts are resolved).

@AWhetter
Copy link
Contributor Author

Thanks for continuing to spend time on this. I've resolved the conflicts. Given that it's been so long since this was submitted I've rebased the commit as well.

@rwgk
Copy link
Collaborator

rwgk commented Aug 16, 2024

@henryiii is there a chance you could take a look?

I believe we should merge this. mypy is only one consumer, and there are issues they haven't responded to for a long time (#4888). I don't think this PR makes anything worse in that respect.

The other consumer are humans. Clearly it's useful to see the signature for all overloads.

@rwgk
Copy link
Collaborator

rwgk commented Aug 21, 2024

@AWhetter @henryiii

I ran this PR through Google's global testing. It passes after rerunning stubgen for ~20 .pyi files (the generated files are checked in, unfortunately).

A typical .pyi diff is:

     @overload
     def __setitem__(self, arg0: int, arg1: Program) -> None: ...
     @overload
     def __setitem__(self, arg0: slice, arg1: ProgramList) -> None: ...
+    @overload
+    def __setitem__(self, arg0: int, arg1: Program) -> None: ...
+    @overload
+    def __setitem__(self, arg0: slice, arg1: ProgramList) -> None: ...

I.e. stubgen generates the same signature twice; not surprisingly.

I'll have to rerun the global testing with the updated .pyi files. Might take a day to get the results.

@rwgk
Copy link
Collaborator

rwgk commented Aug 23, 2024

I.e. stubgen generates the same signature twice; not surprisingly.

I'll have to rerun the global testing with the updated .pyi files. Might take a day to get the results.

Good news, nothing breaks, but seeing that duplication in the stubgen-generated .pyi files got me to change my mind. I no longer think the duplication is useful.

The "generic" signature that is replaced here is far more confusing than helpful.

So I did this, based on this PR: #5329

The core diff there is simply:

diff --git a/include/pybind11/pybind11.h b/include/pybind11/pybind11.h
index 9f356828..27581348 100644
--- a/include/pybind11/pybind11.h
+++ b/include/pybind11/pybind11.h
@@ -613,10 +613,7 @@ protected:
         /* Create a nice pydoc rec including all signatures and
            docstrings of the functions in the overload chain */
         if (chain && options::show_function_signatures()) {
-            // First a generic signature
-            signatures += rec->name;
-            signatures += "(*args, **kwargs)\n";
-            signatures += "Overloaded function.\n\n";
+            signatures += "Overloaded function:\n\n";
         }
         // Then specific overload signatures
         bool first_user_def = true;

If have to put that alternative through global testing again.

@rwgk
Copy link
Collaborator

rwgk commented Aug 23, 2024

Global testing for #5329 passed, without having to make any changes to existing stubgen-generated .pyi files.

@AWhetter What do you think?

#5329 still is mostly your work. Do you want to update this PR with my changes on top?

@AWhetter
Copy link
Contributor Author

#5329 does not address the original issue on its own. The lack of prepended signatures means that Sphinx is unable to parse and document the signatures. The readability is improved though, and the situation for stubgen is also improved.

I no longer think the duplication is useful.

Do you mean the duplication in the docstring or the duplication in the stubs generated by stubgen?
If it's the duplication in the docstring then, to be honest, I'm less invested in the "Overloaded function" type of docstring. I'm more interested in getting this type of docstring out:

    add(...)
     |      add(arg0: int, arg1: int) -> int
     |      add(arg0: float, arg1: float) -> float
     |      add(arg0: None, arg1: None) -> None
     |      A function which adds two numbers.
     |
     |      Internally, a simple addition is performed.
     |      Both numbers can be None, and None will be returned.

So if we need to find a compromise, I'd be happy with the following being the default:

    add(...)
     |      Overloaded function:
     |
     |      1. add(arg0: int, arg1: int) -> int
     |
     |      Add two integers together.
     |
     |      2. add(arg0: float, arg1: float) -> float
     |
     |      Add two floating point numbers together.

and then there being a pybind option to get the first docstring that I mentioned.
Or you previously mentioned some kind of slow transition to the new docstring format (#2621 (comment)).

If the duplication in the stubs is the issue, then I can look at addressing this in stubgen.

@rwgk
Copy link
Collaborator

rwgk commented Aug 23, 2024

Regarding duplication, what I had in mind is this (docs/advanced/misc.rst in this PR):

    add(...)
     |      add(arg0: int, arg1: int) -> int
     |      add(arg0: float, arg1: float) -> float
     |      Overloaded function.
     |
     |      1. add(arg0: int, arg1: int) -> int
     |
     |      Add two integers together.
     |
     |      2. add(arg0: float, arg1: float) -> float
     |
     |      Add two floating point numbers together.

The signatures appear twice.

For the human readability aspect:

Sometimes, quite commonly, signatures are long and unwieldy. I often give up trying to parse them visually. If they appear twice, I give up even quicker, because it just looks even more like a wall of incomprehensible stuff.

For stubgen: I'm worried tooling may behave weirdly if the signatures are duplicated.

For sphinx: I don't know.

Can we find a way to make sphinx happy without duplicating the signatures?

If not, improving sphinx would seem the way to go.

If the duplication in the stubs is the issue, then I can look at addressing this in stubgen.

Getting into sphinx seems better in that case.

But, tangentially, if you could help making progress with the ideas under #4888, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Docs or GitHub info needs-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT] Generate docstrings that are compatible with Sphinx and stubgen
7 participants