-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a compact PSBT based encoding for wallet descriptors #135
Comments
For everyone interested in joining the discussion, we'll be at the next Gordian Developer community meeting Nov. 1. to present this proposal. |
cc @bigspider |
@seedhammer Quick note: we deprecate the use of the |
@seedhammer @ChristopherA See below for my initial response to some of the points raised above. Why not use the Blockchain Commons Envelope format for descriptors?
Why not use CBOR for encoding?
I think a major sticking point here is in thinking that representing output descriptors in dCBOR/Envelope would require highly general purpose parser/codecs. This is not true. Minimal specifications for output descriptors represented as Gordian Envelope will yield minimal, deterministic binary serializations that could be documented without any reference to CBOR or Envelope, and would be highly suitable for embedded environments. Unlike bespoke formats however, using Envelope provides a forward-looking platform that makes it easy to add support for future enhancements. |
@seedhammer @ChristopherA I'd also like to add that our Rust implementation of Gordian Envelope also provides a lot of "feature gates" for many optional extensions like encryption, which makes it more suitable for use in resource-constrained environments. Users can turn off all the feature gates they don't use, and the Rust compiler will aggressively code strip all unneeded dependencies. |
@seedhammer @ChristopherA In Envelope notation, the subject is just the text of the output descriptor, and it has one assertion declaring its type:
Now let's look at the envelope as a tagged CBOR structure:
The first two bytes are CBOR tag 200, indicating this is a Gordian Envelope. This tag is now standardized, as it has been registered with IANA. This value is a constant, and can be expected by a minimal parser. The next byte is The next two bytes introduce the first element of the CBOR array, which is a CBOR tagged element tagged with (24). In this minimal encoding this value will be a constant, and can be expected by a minimal parser. The next two bytes introduce a CBOR text string, and specify its length. The bytes that follow it are the textual representation of the output descriptor. No null terminator is needed. The remaining bytes are the second element of the array, which is a CBOR map with one entry. The key is the integer So to a minimal parser, the "magic bytes" are at the beginning (envelope) and the end (output descriptor) and everything amounts to simple to parse, often invariate values. Now to be fair: if support for |
So if output descriptors are so simple, why use Envelope to represent them? Two reasons:
|
I’d like to see the keys externalized (remove xpubs and point to binary keys). I find that having notes possible on the different keys in a multisig useful, for instance maybe a nostr or signal address to contact the keyholder. |
That's all easily doable using Envelope, but the more complex and variable the structure becomes, the more likely you'll want a full dCBOR/Envelope implementation, even if you need nothing more than the base Envelope specification with all the optional feature gates turned off. |
I've implemented a strawman Go codec to keep the conversation concrete: https://github.com/seedhammer/bip-serialized-descriptors
Understood. Note that the the particular UR representation is outside the scope of this specification. I mentioned For that reason, I'm leaning towards a generic
Almost: this proposal binary encodes xpubs, as referred to in #135 (comment).
It is not clear to me how the enveloped descriptor format is completely specified without referring to neither dCBOR/Envelope, nevermind the requirement that fields be sorted according to SHA256. Can you describe the format? You did offer a thorough example of one instance of an enveloped output descriptor, but that's not the same as specifying the format for every descriptor. The former is sufficient for encoders, but decoders need to cover every possible instance. One example where this matters: your specified minimal encoding for a two-element array ( To reciprocate your efforts, I implemented most of this proposal: https://github.com/seedhammer/bip-serialized-descriptors. Note in particular how few lines the codec is, assuming you already have access to a PSBT codec. Total around 300 lines, including comments and no external dependencies.
This proposal is an existence proof that BIP-174 is extensible. Perhaps not as much as Envelopes, but enough to cover at least two important use cases (PSBTs and now descriptors) and I see no reason the BIP-174 cannot be used for most/all future binary encodings in BIPs.
It may be rare, but that's besides the point. All things equal, wouldn't you agree that a self-contained specification is superior to a specification with external references?
This is another major sticking point. I claim that BIP-174 (PSBT) is very much a standard on at least equal footing to what other standards bodies produce. I further claim that it is more important for widespread use to produce a BIP than to submit proposals to other standards bodies. Regardless of format, a proposal should not be declared complete until significant buy-in from the community has been attained (gauged through bitcoin mailing lists, wallet developers). Finally, I claim that it's more important to have a simple specification than covering use cases outside of Bitcoin. Of course, the above claims assume the Bitcoin perspective. |
Hey! I currently have been working with the PSBT format and I can't recommend it for other serialization purposes as it was designed with normal processors in mind where memory is freely available, it doesn't really suit embedded use cases as the the PSBT format is complex enough to parse and to keep in memory.
I think this point is not valid, CBOR was designed to be extensible and CDDL too, in fact, one can extend choices in a CBOR type by using the Socket/Plug mechanism which allows each one to extend choices, See:
The CBOR format was made specifically to be small and for embedded devices, there are several implementations of CBOR out there that are very small, for example:
For real world usage of NanoCBOR there is: Which decodes the standard mentioned before for MCU firmware upgrades.
I agree with this but using the CBOR representation avoids parsing the text which is more difficult to do and does not make it easy to do zero-copy parsing since one has to store decoded bytes into dynamically sized containers. For instance, this data structure would be impossible to do if parsing from text as one has to additionally keep a separate location in memory to where to copy decoded stuff into, instead of a single one for the descriptor AST. That one reuses the buffer of the CBOR-encoded crypto-output, so basically zero-copy. Even then I think that with CBOR it is complicated, but without it is even harder.
Yeah but PSBT is very application specific and I doubt that code that parses PSBT could be easily adapted to parse a format based on it since that code assumes PSBT is only for that purpose, it would require refactorings anyway in most code bases that handle the PSBT format. As a side note, CBOR is also self-contained and self-describing so any application that does not speak the UR standards can decode it, this is a plus for languages like JavaScript or Python while also being small and compatible with embedded devices.
Nor is the PSBT format, in fact it is hard to serialize the same PSBT just after deserializing it without any modifications, there can exist Unless CBOR is used in a distributed system where all peers need to agree on how some stuff is serialized into CBOR, but that doesn't apply to descriptors. As a side note, CBOR can be deterministic if you define your rules and the standard even recommends so: https://www.rfc-editor.org/rfc/rfc8949.html#section-4.2 P.S.: I think defining custom serialization formats for trivial data structures does not help for adoption of Bitcoin and I hope that trend declines. |
Maybe, but is it relevant? This proposal is an alternative to the Envelope output descriptor format that encodes the descriptor itself in text, not in CBOR. Extending the BCR-2020-010 format is another discussion.
Sure, but this proposal is even simpler. A codec is 300 lines from scratch, even less if your software already includes a PSBT parser. In other words, this proposal is clearly an afternoon dependency, whereas I have yet to see specification for This matters not only for development time and for resource constrained environments, but also for security. 300 lines are easier to review than a CBOR implementation. Don't forget that we're parsing potentially adversarial data!
I believe this point is irrelevant because BCR-2023-007 is textual, see above.
The strawman implementation is existence proof that the PSBT format is sufficiently general to cover output descriptors.
I concede your point about determinism. I stand by my point that a self-contained specification is superior to one with references, all things being equal. |
Software is made of dependencies of dependencies, I think this doesn't applies to today's world, creating another serialization means just verifying another implementation of something instead of using well-proven solutions. If dependencies are an issue for you one could use GNU Guix which largely solves that issue, as a fact Bitcoin Core uses it for reproducible builds. Would have to just review the particular CBOR implementation. Fun fact, I doubt any true Bitcoiner has reviewed every one of the dependencies, here's how you can build a graph:
I'm running it and it'll take a few hours to produce the graph from the very first dependencies, all of the compiler bootstrap chains, all of the GNU utilities, Qt, etc. up to bitcoin-core. I don't think it'll finish generating the graph this day or tomorrow. We rely on other people's code and that's fine.
The code assumes the existence of an operating system with a These are the places where one might see the true limitations of a format like PSBT.
I think a |
I don't think it's worth the decrease in size to have to invent binary representations for every (future) output descriptor feature. This is the reason I think Enveloped Descriptors chose wisely by encoding the descriptor in text. |
It still has to be parsed though, it's the same for both crypto-output and a text representation, either way an application that needs to understand the descriptor needs to parse it into an AST. |
As I've shown, a minimal Gordian Envelope containing a string-based output descriptor has only 12 bytes of overhead. Assuming you weren't concerned about parsing the string itself, I'm sure I can produce a encoder and decoder for it in well less than 300 lines of code total (well maybe not in Rust, which is rather verbose). The only variable length element is the length of the string, and implementing a minimal suitable for this application CBOR varint codec is very tiny. Everything else is in deterministic positions. If you can accept for argument sake that everything I'm saying is true, then why wouldn't we use it? What's the point in having yet another serialization format that offers no standardization and no future expandability? And yes, I agree with @jeandudey that most dependencies have dependencies of their own, and that is normal and accepted. Any dependency tree is going to have way more internal nodes than leaves: you could be writing in pure C and you'll still want So where do you draw the line about having dependencies and why? I understand that constrained systems need code with fewer dependencies, but I already mentioned that our Rust envelope implementation is feature gated and Rust itself aggressively strips dead code. Oh, and I'd also like to point out that in CBOR you can tag any CBOR structure. And as long as you define an equivalence between a CBOR numeric tag you choose and a UR type string, you can simply enclose an envelope in your CBOR tag and poof! you have:
So envelopes don't all have to be |
You asked how such a structure would be speced without reference to CBOR or Envelope. Let's take a look at the serialized structure again:
Here is the complete spec:
Thats it! No mention of Envelope or CBOR at all. Any code that implements the above spec can read a Gordian Envelope with a minimal Output Descriptor structure (not including name, note, or other metadata). On the constrained platform, due to the deterministic nature of Gordian Envelope there is NO possible variance except for the length of the string. On the other end, any full-featured Gordian Envelope codec can read the output of a minimal encoder that outputs the above structure. |
I certainly believe a minimal decoder for enveloped descriptors is feasible from scratch. However, is it correctly understood that such decoder will not be able to parse envelopes with metadata, such as name or compact xpubs, even if the decoder don't care about them? See also below.
I don't think "no standardization and no future expandability" is a fair characterization for this proposal. It is existence proof of expandability by being an expansion of PSBT. It is a standard (from the Bitcoin perspective) because the PSBT is a BIP.
The leaving out of name, note or other metadata is key. Will a minimal decoder be able to successfully extract the descriptor from every envelope, including those with metadata?
Absolutely. Encoding is much easier to do from scratch, because an encoder need only implement what it needs. |
Thank you for a great meeting today! I look forward to @wolfmcnally's simplified specification for a CBOR-based output descriptor format. To make it easier to compare features and because @wolfmcnally asked for the binary representation of this proposal, I've restructured my demo implementation to be usable from the Go playground. https://go.dev/play/p/nouZlbbcEWt is a copy of
Note that the playground is live, so you can edit the example and run it to play with the format without having Go installed. |
Here's a summary of the conclusions from today's Gordian meeting for people not that didn't attend or plan to watch the recording is below. @wolfmcnally et al, please correct any misunderstandings. We agreed that
With respect to this proposal, @wolfmcnally is confident that a specification (and reference implementation) can be produced that is both CBOR compatible and fully specified such that an implementation can be written without depending on a (d)CBOR library. Quite possibly the format will no longer be in envelope form. @wolfmcnally will work on such a proposal as Commons' time and priorities allow. We all seemed to agree that, assuming such specification is produced and that it can be proposed as a BIP, is the best of both worlds: easy to use given a CBOR library, but also simple enough to read and write from scratch. |
Just a quick comment here to say that while output descriptors aren't always part of the user backup stash, they should. The discussion mentions Miniscript but it seems to me that the examples are still quite restricted to multisig. Miniscript also still evolves, is a bit different when used as MiniTapscript, etc. For us at Wizardsardine, it's super important to be able to offer a way to back up these descriptors long term. Users are already using wallets like Liana, and are currently backing up text files. We are really looking forward to see where this thread is going, and hope to see an implementation soon! Metal plate engraved or hammered Miniscript descriptors are definitely a missing part of our toolset. |
We also believe that backing up the descriptor is increasingly important, even singlesig. This has been one of the driving requirements that drove moving from SSKR for only seeds, to Gordian Envelopes for everything. Right now only our reference app, Gordian Seed Tool on iOS/Mac in public beta TestFlight supports this, but we hope more wallets will soon. |
@ChristopherA does "we hope more wallets will soon" mean that BCR-2023-007 ("Envelope Bitcoin Output Descriptors") is final from the perspective of Blockchain Commons? If not, can you comment on the progress of the work mentioned in #135 (comment)? |
@seedhammer I have been working on another task the past couple weeks, but I am now getting down to the new specification we've been discussing and will have more to report soon! |
Prompted by coinkite/BBQr#1 (comment) I posted this proposal to the bitcoin-dev mailing: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-November/022184.html. |
The mailing list post asked whether to extend PSBT rather than invent a new format. A response supports that idea:
|
To summarize today's meeting (inaccuracies are mine!):
From our perspective:
We still believe PSBT is the superior encoding for output descriptors for the following reasons:
On that basis, we intend to:
I'm closing this issue: BC has considered and rejected a PSBT based encoding of output descriptors. |
The first draft is here: https://github.com/seedhammer/bips/blob/master/bip-psbt-descriptors.mediawiki. Also posted to the bitcoin-dev mailing list. |
Prompted by discussions at wizardsardine/liana#539, this is a sketch for a BIP proposal for serializing wallet descriptors. I raise it here for comments and because Blockchain Commons is a recognized standards body for Bitcoin specifications. If there's enough interest, it should be fleshed out and proposed as a BIP.
A Go implementation written from scratch is here: https://github.com/seedhammer/bip-serialized-descriptors
At the high level, this is a binary and compact serialization specification for the [wallet-policies] BIP.
@<key-index>
.pk(NAME)
expressions are replaced with key indexes.Borrowing from BIP174, the format would be
Where
<global-map>
contains one or more of the fields:In future, other script formats (Simplicity?) can be added as separate field types.
The
<key-map>
is a list of keys, matching the indexed references from the descriptor. Each<key-map>
contains one or more of the fields:FAQ
Why not use the Blockchain Commons [BCR-2020-010] format?
The format was recently deprecated, for good reasons: its binary format is compact but difficult to extend to support extensions to the descriptor format, such as Miniscript.
Why not use the Blockchain Commons Envelope format for descriptors?
The standard is being developed and as such does not have an advantage of existing use.
Why not use CBOR for encoding?
What about QR code representation?
A straightforward encoding would be the use the Blockchain Commons [BCR-2020-005] standard for splitting the serialized descriptor into multiple QR code frames. I believe the general purpose
bytes
urtype is sufficient, because the magic header decreases the likelihood of misinterpretation.Doesn't UR support rely on a CBOR implementation anyway?
It's true that the UR specification rely on CBOR for encoding data shards, but it's my understanding that the subset of CBOR required can practically be implemented ad-hoc without a full fledged CBOR library.
Some devices, such as the camera-less Coldcards, won't implement the UR encoding because they exchange serialized descriptors through higher bandwidth mediums such as SD cards, USB, or NFC.
Why not encode the descriptor itself in binary?
The BC envelope format made the same decision of encoding the descriptor in text, for good reasons:
[wallet-policies] https://github.com/bitcoin/bips/blob/bb98f8017a883262e03127ab718514abf4a5e5f9/bip-wallet-policies.mediawiki
[BIP174] https://github.com/bitcoin/bips/blob/master/bip-0174.mediawiki
[BCR-2020-010] https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-010-output-desc.md
[BCR-2020-005] https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-005-ur.md
The text was updated successfully, but these errors were encountered: