Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[smart_holder] Bug fix: std::unique_ptr deleter needs to be copied. #4850

Merged
merged 16 commits into from
Nov 2, 2023

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Sep 19, 2023

Description

IMPORTANT: Quite possibly follow-on work is needed. See comments for details, in particular #4850 (comment) below.

Root cause for this bug: Firmly held wrong belief that std::unique_ptr does not store a deleter.

But here it is:

Bug detected via a few failing tests with PyCLIF-pybind11 (out of many millions).

Suggested changelog entry:

Ralf W. Grosse-Kunstleve added 7 commits September 18, 2023 19:35
Using a `union` is complicated: https://en.cppreference.com/w/cpp/language/union

> If members of a union are classes with user-defined constructors and destructors, to switch the active member, explicit destructor and placement new are generally needed:

Using `std::variant` increases compile-time overhead.

It is currently unclear how much these effects matter in practice: optimization left for later.
```
-- The CXX compiler identification is Clang 15.0.7

/usr/bin/cmake -E __run_co_compile --tidy="/usr/bin/clang-tidy;--use-color;--warnings-as-errors=*;--extra-arg-before=--driver-mode=g++" --source=/__w/pybind11/pybind11/tests/test_class_sh_inheritance.cpp -- /usr/bin/c++ -DPYBIND11_ENABLE_TYPE_CASTER_ODR_GUARD_IF_AVAILABLE -DPYBIND11_TEST_EIGEN -Dpybind11_tests_EXPORTS -I/__w/pybind11/pybind11/include -isystem /usr/include/python3.9 -isystem /__w/pybind11/pybind11/build/_deps/eigen-src -Os -DNDEBUG -fPIC -fvisibility=hidden -Wall -Wextra -Wconversion -Wcast-qual -Wdeprecated -Wundef -Wnon-virtual-dtor -flto=thin -std=c++17 -o CMakeFiles/pybind11_tests.dir/test_class_sh_inheritance.cpp.o -c /__w/pybind11/pybind11/tests/test_class_sh_inheritance.cpp
/__w/pybind11/pybind11/tests/pure_cpp/smart_holder_poc_test.cpp:264:30: error: pointer parameter 'raw_ptr' can be pointer to const [readability-non-const-parameter,-warnings-as-errors]
        new int(19), [](int *raw_ptr) { delete raw_ptr; });
                             ^
                        const
```
…esired (the lambda function only captures a reference, which can become dangling).
```
/usr/bin/cmake -E __run_co_compile --tidy="/usr/bin/clang-tidy;--use-color;--warnings-as-errors=*;--extra-arg-before=--driver-mode=g++" --source=/__w/pybind11/pybind11/tests/test_class.cpp -- /usr/bin/c++ -DPYBIND11_ENABLE_TYPE_CASTER_ODR_GUARD_IF_AVAILABLE -DPYBIND11_TEST_EIGEN -Dpybind11_tests_EXPORTS -I/__w/pybind11/pybind11/include -isystem /usr/include/python3.9 -isystem /__w/pybind11/pybind11/build/_deps/eigen-src -Os -DNDEBUG -fPIC -fvisibility=hidden -Wall -Wextra -Wconversion -Wcast-qual -Wdeprecated -Wundef -Wnon-virtual-dtor -flto=thin -std=c++17 -o CMakeFiles/pybind11_tests.dir/test_class.cpp.o -c /__w/pybind11/pybind11/tests/test_class.cpp
/__w/pybind11/pybind11/include/pybind11/detail/smart_holder_poc.h:114:5: error: single-argument constructors must be marked explicit to avoid unintentional implicit conversions [google-explicit-constructor,-warnings-as-errors]
    custom_deleter(D &&deleter) : deleter{std::move(deleter)} {}
    ^
    explicit
/__w/pybind11/pybind11/include/pybind11/detail/smart_holder_poc.h:120:76: error: forwarding reference passed to std::move(), which may unexpectedly cause lvalues to be moved; use std::forward() instead [bugprone-move-forwarding-reference,-warnings-as-errors]
    return guarded_delete(std::function<void(void *)>(custom_deleter<T, D>(std::move(uqp_del))),
                                                                           ^~~~~~~~~
                                                                           std::forward<D>
```
Ralf W. Grosse-Kunstleve added 2 commits October 31, 2023 23:28
Reduced from a PyCLIF use case in the wild by @wangxf123456 (internal change cl/565476030).
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

Hi @iwanders,

I'm replying here to your message under rwgk#26.

When I saw your message I decided it's best to first transfer a simple existing internal-only test to a new test_class_sh_unique_ptr_custom_deleter here, so that you see everything that exists already:

c684f6c

I was thinking that we probably need more tests in tests/pure_cpp/smart_holder_poc_test.cpp, and expand test_class_sh_unique_ptr_custom_deleter. But I'm very uncertain myself what & how exactly.

Things that come to mind as maybe relevant:

It's not obvious, but some code in smart_holder_poc.h turned out to not be usable like that from smart_holder_type_casters.h (when I worked on it originally a couple years ago). It's only used by tests/pure_cpp/smart_holder_poc_test.cpp. One of the things I always wanted to do but never got to: move all code that is not used from smart_holder_type_casters.h to the tests/pure_cpp/ directory, so that only the production code is left in smart_holder_poc.h. The test-only code in smart_holder_poc.h is probably more distracting than helpful for your purposes.

I was thinking of the smart_holder_poc.h code under this PR as ~complete: it works for Google-internal global testing and solved in-the-wild test failures (17 tests, probably 3-5 unique use cases). Therefore I'm surprised that you're adding code in smart_holder_poc.h.

What do you think about this approach to collaborating:

  • Could you help by reviewing this PR? — With the idea that we want to ensure this PR by itself is healthy, but we don't want to add much.

  • Then you could expand the test coverage more systematically (I'd have suggestions) in a follow-on PR, also here (main pybind11 repo, not my private one, which only has the small free GitHub Actions quota).

@iwanders
Copy link
Contributor

iwanders commented Nov 1, 2023

I was thinking of the smart_holder_poc.h code under this PR as ~complete: it works for Google-internal global testing and solved in-the-wild test failures (17 tests, probably 3-5 unique use cases). Therefore I'm surprised that you're adding code in smart_holder_poc.h.

Hmm, I may be missing something, but I was assuming a custom deleter should survive a roundtrip from unique_ptr -> smart_holder -> unique_ptr.

Currently it doesn't:

  • The from_unique_ptr method moves the deleter from the unique pointer to the smart holder.
  • The counterpart; as_unique_ptr method does not copy the deleter from the smart holder back into the unique pointer?

Github won't let me comment on the appropriate line, but I feel this current PR only fixes the bug in one direction (going into smart holder), not in the direction of going out of the smart holder. This test shows the problem, it will fail with this branch and unmodified code, which is why I was modifying the smart holder file.

Edit;

Could you help by reviewing this PR? — With the idea that we want to ensure this PR by itself is healthy, but we don't want to add much.

Went through it, overall it looks fine to me, my main concern is the above statement that the deleter is only handled when going into the smart holder, not when being converted back to a unique pointer.

What do you think about this approach to collaborating: [...] Then you could expand the test coverage more systematically (I'd have suggestions) in a follow-on PR

Sure, I can't commit a lot of time, but happy to help over the next week or so.

}
struct custom_deleter {
D deleter;
explicit custom_deleter(D &&deleter) : deleter{std::move(deleter)} {}
Copy link
Contributor

@iwanders iwanders Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always tricky, but shouldn't this std::move be a std::forward instead? THe D&& could be a universal reference to an lvalue I think, in which case moving it may not have the intended effect at the call site, while forward will move if it is an rvalue but copy if it is an lvalue?

Edit; so make this like line 122.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, thanks for catching this!

This is always tricky,

Yeah, I wasn't paying enough attention here before.

} // namespace class_sh_unique_ptr_custom_deleter
} // namespace pybind11_tests

PYBIND11_SMART_HOLDER_TYPE_CASTERS(pybind11_tests::class_sh_unique_ptr_custom_deleter::Pet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feel free to ignore) This line missing a semicolon trips up the syntax highlighting on github, but it seems to be in line with the other tests. Could add a do {} while(0) at the end of the macro to enforce the ; at the end of the line. But I think that requires a lot of changes in other unit tests as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Too late", unfortunately: I just looked, I'd have to change 35 files around the Google codebase, outside the pybind11 source directory. It'll set me back a whole day or two worth of effort. And there are so many other far more substantial things wishing for those days.


def test_create():
pet = m.create("abc")
assert pet.name == "abc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit test doesn't actually test the deleter is ran, unless ran inside valgrind / lsan, is that intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit test doesn't actually test the deleter is ran, unless ran inside valgrind / lsan, is that intentional?

@wangxf123456 was completely focused on reducing a test from the wild, just enough to reproduce the error message we saw.

Copy-pasting what @iwanders wrote in another comment:

my main concern is the above statement that the deleter is only handled when going into the smart holder, not when being converted back to a unique pointer.

I agree. I was also completely focused on the issue in the wild. — Reporting a thought/uncertainty in the back of my mind: Do we actually have to solve this (is that feature actually needed in the wild)? Or would it be sufficient to clearly report an error if someone tried to use the non-existing feature? And then work on it as needed.

I'll add a note about it to the PR description, then merge this PR when I see that the GHA are happy.

Sure, I can't commit a lot of time, but happy to help over the next week or so.

I really appreciate any help! And I'm a big believer in small steps. I want to encourage you to pick out one thing at a time. Solve it. Merge. Next one. (My goal, in general, is to review PRs within hours; highest priority.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would it be sufficient to clearly report an error if someone tried to use the non-existing feature? And then work on it as needed.

Yeah, tricky, it's a balance. I'd at least throw in case there's a non default deleter detected in as_unique_ptr, that way it explicitly fails, instead of implicitly losing the deleter and leaking memory, and that should still be doable while avoiding most of the extra code I had to add to account for it.

Speaking of as_unique_ptr, another random thought; that method could be marked &&, so std::unique_ptr<T, D> as_unique_ptr() &&, that ensures it can only be called on an r-value smart_holder, which makes it clear that the smart_holder object after the method call is in an indeterminate state. In that case it would be illegal to call as_unique_ptr on an lvalue though, so it makes it a bit more cumbersome at the callsite, but it does draw more attention to what it is doing. Not saying this should be done, but just stating it is an option to make it clear we are moving out of the smart holder.

I'll add a note about it to the PR description, then merge this PR when I see that the GHA are happy.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that as_unique_ptr was also wrapped with PYBIND11_TESTS_PURE_CPP_SMART_HOLDER_POC_TEST_CPP; 100% agree it's probably not worth worrying about the deleter there then 👍

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

@iwanders I found a simple trick to make it obvious what is and isn't used in production: 4d9d722

I need to do something else right now, but will come back here to similarly ifdef-out all smart_holder_poc.h code that is not actually used in production code.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

Side remark:

Something weird is happening with this test:

..\..\tests\test_type_caster_odr_guard_1.py ..FF                         [ 95%]

It seems to affect only windows-2022 builds. Un/Related? I need to find out.

(There is one test_iostream flake mixed in.)

…` to mark code that is not used from production code. Add comment to explain.
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

Speaking of as_unique_ptr, another random thought; that method could be marked &&, so std::unique_ptr<T, D> as_unique_ptr() &&, that ensures it can only be called on an r-value smart_holder, which makes it clear that the smart_holder object after the method call is in an indeterminate state.

Hm ... IIUC that's not in line with my thinking behind the code. I wanted to leave the smart_holder in an well-defined state at all times. I believe it has to, because we have to assume it lives in a Python object until that is destroyed.

Your very good round-trip question seems much more important in the context of smart_holder_type_casters.h: is that actually working as intended?

I'm thinking we need something like these:

struct sddm : std::default_delete<atyp > {};
struct sddc : std::default_delete<atyp const> {};
std::unique_ptr<atyp, sddm> rtrn_udmp() { return std::unique_ptr<atyp, sddm>(new atyp{"rtrn_udmp"}); }
std::unique_ptr<atyp const, sddc> rtrn_udcp() { return std::unique_ptr<atyp const, sddc>(new atyp{"rtrn_udcp"}); }
std::string pass_udmp(std::unique_ptr<atyp, sddm> obj) { return "pass_udmp:" + obj->mtxt; }
std::string pass_udcp(std::unique_ptr<atyp const, sddc> obj) { return "pass_udcp:" + obj->mtxt; }

Only more sophisticated to establish firmly that the deleter is copied properly.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

I think this PR is in good shape for merging now, but I'm mystified by the persistent type_caster_odr test failures with windows-2022. I need to get to the bottom of those first.

I'll use that time to rerun Google global testing for this PR, although the only change is the std::move / std::forward replacement.

@iwanders
Copy link
Contributor

iwanders commented Nov 1, 2023

Hm ... IIUC that's not in line with my thinking behind the code. I wanted to leave the smart_holder in an well-defined state at all times. I believe it has to, because we have to assume it lives in a Python object until that is destroyed.

Ah, sorry, I worded this ambiguous. You're right it will be in a well defined state, I was mostly concerned about the semantics, to me the as_unique_ptr consumes the smart holder and it would make sense to me to require an rvalue to allow that. On second look I see is_populated is never set to false, except through the default constructor, it's not set to false by as_unique_ptr, but again, because it's only used by the unit tests lets leave it ;)

Only more sophisticated to establish firmly that the deleter is copied properly.

Hmm, still trying to wrap my head around that file as an outsider xD. Easiest way I'd test it is by making a destructor that has a side effect (sets a boolean on call), then perform the round trip and ensure the deleter is only called after the final value goes out of scope.

Does the `PYBIND11_TESTS_PURE_CPP_SMART_HOLDER_POC_TEST_CPP` define have anything to do with it?
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

Easiest way I'd test it is by making a destructor that has a side effect (sets a boolean on call), then perform the round trip and ensure the deleter is only called after the final value goes out of scope.

That sounds good!

I recommend waiting until this PR is merged, so that we can collaborate more easily. I'll try to get to the bottom of the type_caster_odr_guard failures with highest priority.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 1, 2023

Certainly, the windows-2022 failures are due to changes in the environment, unrelated to this PR:

That ran with smart_holder head, without any code changes.

6 windows-2022 failures, 5 are type_caster_odr_guard failures, one test_iostream flake (which is very common).

I'll revert my latest commit here, then merge. I'll work on dealing with the type_caster_odr_guard failures separately.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 2, 2023

For easy future reference, global testing internal ID: OCL:566559299:BASE:578853944:1698937167027:61af92f8 (passed)

@rwgk rwgk marked this pull request as ready for review November 2, 2023 15:21
@rwgk rwgk requested a review from henryiii as a code owner November 2, 2023 15:21
@rwgk rwgk merged commit 55105fb into pybind:smart_holder Nov 2, 2023
147 of 152 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Nov 2, 2023
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Nov 2, 2023
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 2, 2023

@iwanders FYI I'll merge from master into smart_holder, too. Should be done in an hour or so. Doesn't change much I guess, just so you're not surprised.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 2, 2023

FYI I'll merge from master into smart_holder

Done.

@iwanders
Copy link
Contributor

iwanders commented Nov 2, 2023

@rwgk , thanks, I haven't had time to look at it yet, but wanted to let you know I'll try to make that deleter roundtrip unit test over the weekend. I have to take some time to read up on how the smart holder actually works and how it interacts with unique pointers or deleters first :)

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 2, 2023

read up on how the smart holder actually works

I'd be happy to meet and go through the code together. If you're interested, please get in touch via email (in git log). (I looked but couldn't find your direct email.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants