-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use std::byte for strings #7
Comments
The specification doesn't define the encoding, but in practice, most cases of bencoded data just use ASCII strings. You should be able to use any string encoding you like though, including a string of bytes. Just use using bdata = bencode::basic_data<std::variant, long long, my_byte_string, std::vector, bencode::map_proxy>; (Well, UTF-16 might not work. It probably wouldn't be hard to make it work, but that'd make for a very weird payload - ASCII for the "keywords" and UTF-16 for the string values...) |
the "node_id" for the mainline DHT is what's troubling me. it is just a random 160 bit value, it could contain any byte value even null bytes, so using a string seems dangerous for me here.
Yes, so imo using an interface that can "interpret" the content is dangerous. The user knows the semantic of the bencoded data they are dealing with best, and they can parse the raw bytes into the right encoding with proper validation. |
Dangerous how? EDIT: Of course, if you want to be extra-careful in your project to avoid passing around a |
All that said, maybe it would be useful for this library to preserve the underlying byte type when decoding. That is, if you pass it a |
I agree, using |
The same applies to
I'm not sure that's how I'd interpret the standard. In any case, this is somewhat different from the more-specific concern that some "strings" in a bencoded message are pure binary data, rather than a sequence of characters with some particular encoding. In that case, there's probably some practical value in working with |
hmm you are right, std::byte is also variable width. what a mess. I guess the only sensible choice would be uint8_t then. yes, I agree with you that any sort of text encoding should probably be accessed via some sort of character type. but in bencode a "string" is really just a byte vector, it is not nessesarily used to store text, even in the bep3 spec it is used to store raw sha1 hashes rather than say a hex coded hash. I think makes the most sense is to by default provide raw bytes, and the user can cast to the right type/encoding base on external knowledge they have about what the bencoded thing "is". this is what bep3 does as well, every time a bencoded string is meant to be text it mentions it explicitly (also the encoding), e.g.:
|
Once I implement it, you'll be able to provide a string (or stream) of This way it's up to callers to specify what they want, since bencode.hpp won't do any type conversion on the underlying element type whatsoever (aside from checking elements with syntactic meaning of course). |
Hi I'm new to the whole bencode thing so thanks a lot for this easy to use library! I was wondering, can a bencoded string include any arbitrary bytes, if so would std::byte be a better container for it?
The text was updated successfully, but these errors were encountered: