This single header provides two alternate implementations of a replacement for std::tuple, which is both faster and easier to use.
When I first starting using the tuple+structured binding pattern for functions with multiple returns, I assumed it was identical to an unnamed structure in its code generation and that I was basically paying nothing for it. However, when I dug more closely, this wasn't true.
In actuality, std::tuple does not implement an anonymous struct with member variables based on the template parameters. This is because c++20 does not allow expanding variadic template arguments to define structure members, and would give you no way to give the members unique names if you could. So, std::tuple is instead implemented as a nested structure, with each nesting level defining one member variable.
This has several implications for code generation, both for optimized code, and debuggable code, and API:
- This will cause arguments that would have been passed/returned in registers to instead be passed on the stack, in multiple scenarios. This applies to both GCC and MSVC.
- Nesting past a certain level will cause MSVC to never keep the struct in registers.
- Fast field access, fast initialization, etc are all based on relying on inlining by the optimizer. This makes debug builds slow, and may also break down past some nesting level.
- When viewed in the debugger, instead of seeing a simple struct, you will see a nested struct (unless you are using a debug visualizers to neaten up the display).
- You must access fields using a non-member get function (std::get)
- The member variables are stored in memory in the opposite order of what you would expect. This is both confusing, and makes the struct layout incompatible with an identical named struct.
- THe more items in an std::tuple, the deeper the template nesting level is, exposing the possibility of running into compiler limits or asymptotically worsening compile times on large tuples.
The only other way to implement an std::tuple class would have been to define one template for each number of arguments. So, that's what I did, using a simple program to generate the output .h file.
This .h file implements 2 classes, CTuple and CCompatibleTuple. The difference between them is that CTuple defines the member variables in the same order as the type template arguments, while CCompatibleTuple defines them in the opposite order, giving it a memory layout compatible with std::tuple, albeit at a slight cost in debug code. Because of the order layout in CTuple being forward, it is able to initialize the struct without defining constructors - just using brace initialization. This is much better for debug code generation and static initialization. OTOH CCompatibleTuple has enhanced compatibility with std::tuple because of having the same memory layout. This allows, for instance, passing a CCompatibleTuple to a function expecting a reference to an std::tuple.
These classes follow the basic api of std::tuple, including deduction guides. You can access fields using std::get, construct them the same way, etc. In addition there are several convenience APIs:
- You can use a member Get<index>() or Get<type>() to access the fields.
- You can also conveniently access the fields as _0, _1, ... This has the advantage of generating good code in debug builds without relying on inlining.
- compiler dependent methods of forcing inlining even in debug builds are used in member acccess. Note that you will need to set /Ob1 in oyur debug build for MSVC to take advantage of this.
Fasttuple.H was generated by this simple portable c++ program. The program has multiple options which may be used to customize the header file.