-
Notifications
You must be signed in to change notification settings - Fork 477
Create a bytecode interpreter to compactly represent text/JSON name maps. #1789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This will be used to generate data compactly as static strings, which optimize much more consistently than some of the code we generate today like the dictionary literal used for `NameMap`s. Later on, I'd also like to look at how we could use this to tabularize parsing and serialization. With some other refactoring, I think we could get to a point where the visitation APIs are completely driven by bytecode strings and we wouldn't have to generate the large message-specific methods that do this.
/// UTF-8. Specifically, | ||
/// | ||
/// - Integers are encoded in a varint-like format similar to protobuf, except that only the low | ||
/// 7 bits are used. The most-significant bit is always clear, and the second-most-significant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is since the underlying storage is utf-8 and so the high bit would bring on other meanings?
When SE-0483InlineArray
Literal Syntax get approved, I'm guessing it would make sense to support/move to that as it means a full 8bit encoding so things could get smaller? Any idea how easy this code could be leverage to support that eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested it with newer toolchains, but I suspect there would still be heavy type-checking costs to using InlineArray
since it would have to be initialized with an array literal. Given this:
static let foo: InlineArray<someHugeNumber, UInt8> = [0, 1, 47, 8, ...]
At a minimum, even if it's able to store that as readonly data with minimal codegen under all circumstances, the type checker still has to verify that every element of that array literal is a UInt8
and that it has exactly someHugeNumber
elements. That's a lot of work that StaticString
doesn't have, since that string literal is just an atomic blob.
What we really need is a way to encode raw bytes as a special kind of string-ish literal, which could supports \xFF
-style encoding. Maybe it makes sense to propose something like that once semi-related features like @section
land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wasn't thinking compile time costs and was just thinking about (slightly) smaller binaries as some numbers would shrink in encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I suspected, creating an InlineArray
of, say, 100 elements generates ~100 type variables and ~100 constraints in the constraint system. https://godbolt.org/z/9ahWK6oj1
None of those constraints are hard to solve—they're unambiguous since there aren't any overloads or casts/bridging involved—but the compiler still has to run through them all to prove that it's correct. I wouldn't feel comfortable putting that much load on the type checker when StaticString
doesn't have any of that baggage.
But that's not all—in debug builds, the compiler still generates in non-optimized builds a scales-linearly-with-the-size-of-the-array "one-time initialization function" for the InlineArray
for the property's unsafeMutableAddressor
, even though it also generates it as static RO data. I don't really know what's going on there, but it's more unpredictability that we should avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I wonder if something should get opened on the Swift project in general about this since it sorta seems like this was the space InlineArray
was trying to solve, and it compiler performance for large blobs is going to be bad, what was the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra codegen in debug builds seems silly and is probably an area with room for improvement, but I don't think there's a way around the type checker part since something has to validate that the code is good.
// The only way this could happen is if the program were a single byte, meaning that it | ||
// only has a 6-bits-or-fewer format specifier and nothing else. In other words, there | ||
// are no instructions, and we can simply return as there is nothing to execute. | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still wonder if this should fatalError since it shouldn't happen and if something were generated with one character something went wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty message produces a single code-point name map, since it's just the version prefix and then nothing else: "\u{0}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should just be _NameMap()
like it is now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but do we care enough about special casing that in the generator? The overhead of interpreting the empty name map bytecode would be low, especially for such an uncommon situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat common for things that use extensions over fields as a design pattern (don't ask), so it does happen. The main "savings" from my POV is just less "startup code" that has to run, yes, it's is less, but zero is always less then "a little" 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ps - it's also even less for the compiler to look at when type checking the compile. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, and a little extra complexity in the generator for more savings at runtime is probably the right choice in general.
/// | ||
/// - Parameter programFormat: The program format of the bytecode stream to write. Currently, | ||
/// the only valid value is zero. | ||
public init(programFormat: Int = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a constant over in the main runtime library for the zero value that is used by these places also?
/// - Parameter programFormat: The program format of the bytecode stream to write. Currently, | ||
/// the only valid value is zero. | ||
public init(programFormat: Int = 0) { | ||
if programFormat != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guard?
if byte & 0x40 == 0 { | ||
return value | ||
} | ||
shift += 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some guard that shift
doesn't get too large?
/// - Returns: An array of `UnsafeBufferPointer`s containing the strings that were read from the | ||
/// bytecode stream. See the documentation of `nextString()` for details on the lifetimes of | ||
/// these pointers. | ||
package mutating func nextNullTerminatedStringArray() -> [UnsafeBufferPointer<UInt8>] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like this got a test (or a helper to write one of these)
/// ## Operands | ||
/// * An integer representing the lower bound (inclusive) of the reserved field number range. | ||
/// * An integer representing the upper bound (exclusive) of the reserved field number range. | ||
case reservedFields = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reservedNumber" nameMap isn't specific to fields, and enums also have reserved numbers, we just haven't needed to capture them.
@@ -110,6 +113,50 @@ class FieldGeneratorBase { | |||
} | |||
} | |||
|
|||
func writeProtoNameInstruction(to writer: inout ProtoNameInstructionWriter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't var fieldMapNames
come out now?
} | ||
let jsonName = fieldDescriptor.jsonName | ||
|
||
// TODO: Create a separate instruction for the group being able to match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should do this extra opcode now vs. keeping it as a TODO
if isprint(Int32(truncatingIfNeeded: value)) != 0 { | ||
self.append(escapingIfNecessary: UnicodeScalar(UInt32(truncatingIfNeeded: value))!) | ||
} else { | ||
code.append(String(format: "\\u{%x}", value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth picking off \0
as the one special case? (tab, newline, etc don't seem worth it, but if \0
is common, it would slightly shorten strings and there's no code on the decoding side.
This is something I've been cooking for a bit, as a way to reduce codegen.
The problem with the
ExpressibleByDictionaryLiteral
approach we're using today is that its codegen is unpredictable. For what is really just static data, the Swift compiler always generates sequences of instructions in debug mode, and in release mode it may or may not be outlined into global data depending on various things outside of the author's control.StaticString
s, on the other hand, are super predictable since they always become static read-only data (ignoring the oddball edge case involving a single code point). So, why not do what some other languages do and create blobs of binary data that we can process at runtime instead?The catch is that since
StaticString
s must be valid UTF-8, we have to invent our own little bytecode format a bit, to avoid encodings that might be invalid. The format is a stream of bytes as follows:Layered on top of the bytecode stream is an interpreter, which abstracts "instructions" that are
RawRepresentable
asUInt64
, and the interpreter loop just reads the next instruction and invokes a callback that is supposed to read and process the operands.The bytecode is an "ABI" that we can add to, but never remove from (for a particular major version of the runtime). For example, we could add new instructions in the future to represent things differently, as long as the runtime can still correctly interpret the bytecode from older generated code.
There are definitely more optimizations that we could do to the representation here to make it more compact. The most obvious thing would be to add specialized opcodes for the most common cases. Other things we could do would be to sort the field numbers in numerical order and represent them as adjacent differences instead of their absolute value—that would shrink things a bit for dense messages/enums that have numbers greater than 63.
But for now, I just wanted to put up this draft as a vibe check. I took one of our particularly large transitive proto closures internally and measured the optimized and dead-stripped output with and without this change, using a tiny binary that just imports a huge proto module, sets a single field in a message, and then black-holes the message:
But I think this is just the beginning of what we can do—namely, we could refactor some of the generated message storage internals and make visitation completely generalized over a similar bytecode that we would generate about a message's layout. That, I imagine, would yield far greater codegen savings (and likely significant compile time savings as well).
Side note: Many of the APIs presented by the bytecode interpreter and the generated code just scream for
Span
andUTF8Span
. I suppose one day we'll have a high enough deployment target that we could use those 😭