Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash Serialization #182

Conversation

davebenvenuti
Copy link
Contributor

@davebenvenuti davebenvenuti commented Jan 30, 2025

This PR introduces a new HashSerializationCompiler which generates to_h, as_json, and to_json methods for our generated code.

to_h works mostly as it did before except it now recursively calls to_h on Message values.

as_json works similarly, except it uses the json_name as the Hash key for a given Field instead of the field name. This value can be overridden in proto definitions and protoc provides a default value if none is specified. This functionality is also required to pass the conformance suite one day. Additionally, it supports a compact option, which was requested for an internal use case, which will exclude optional values which aren't set as well as empty arrays (repeated) or maps from the output if set to true.

to_json basically just calls as_json and then uses the json library to stringify the result, passing along and supporting the same compact option.

⚠️ Note that this PR relies on the Decorated Field PR. If we decide that change isn't appropriate then this can be reworked to not rely on it fairly easily.

@davebenvenuti davebenvenuti force-pushed the davebenvenuti/hash-serialization branch 4 times, most recently from 14e30fa to 4d9a090 Compare January 30, 2025 17:41
@davebenvenuti davebenvenuti marked this pull request as ready for review January 30, 2025 18:00
@davebenvenuti davebenvenuti added the #gsd:43107 Ruby Language: RPC Code Generation for Shopify Internal APIs [GSD 43107] label Jan 30, 2025
fahk.rb Outdated Show resolved Hide resolved
@davebenvenuti davebenvenuti force-pushed the davebenvenuti/hash-serialization branch from 4d9a090 to 161ec23 Compare January 31, 2025 00:51
Copy link
Contributor

@rwstauner rwstauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions.
Overall seems good, but there may be some improvements we want to make (now or later) for performance.

One thing I see missing for compatibility is that fields of type bytes should be base64 encoded in the JSON output, but that could be added later.


"result[#{key}] = #{value} if send(:#{oneof_selection_field_name}) == :#{field_name}"
elsif field.repeated?
"#{value}.tap { |v| result[#{key}] = v if !options[:compact] || v.any? }"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often write my ruby code this way (and I see an instance of this in the old code), but I wonder if, for performance, the generated code should not have unnecessary blocks.
It's probably fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is #{value} could evaluate to a recursive method call, so I used tap to avoid duplicate calls. I suppose we could use a local variable instead. tap seemed more like Idiomatic Ruby but the performance concern is definitely worth considering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I was thinking maybe we could use something like tmp_#{field.lvar} = #{value} and then check the tmp var 🤷

elsif field.repeated?
"#{value}.tap { |v| result[#{key}] = v if !options[:compact] || v.any? }"
elsif field.optional?
"result[#{key}] = #{value} if !options[:compact] || has_#{field.name}?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interpolation for the predicate name triggers my spidey-sense but this is how we generate the name when defining the predicate so I guess it's fine.
Might be nice to have predicate_name or something on the Field class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of putting predicate_name on the Field class.

Comment on lines +1641 to +1642
result[:"enum_1"] = @enum_1 if send(:"oneof_field") == :"enum_1"
result[:"enum_2"] = @enum_2 if send(:"oneof_field") == :"enum_2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit wasteful (calling send(key) multiple times) compared to the old version

oneof_selection_field_name = oneof_selection_fields[field.oneof_index].name.dump
field_name = field.name.dump

"result[#{key}] = #{value} if send(:#{oneof_selection_field_name}) == :#{field_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see it in the generated code, sometimes there are multiple of these in a row, so we're calling send(key) several times, I wonder if there is a way to consolidate that.

oneof_selection_field_name = oneof_selection_fields[field.oneof_index].name.dump
field_name = field.name.dump

"result[#{key}] = #{value} if send(:#{oneof_selection_field_name}) == :#{field_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we use send here?
I see it in the old code, but we (now) have (public) attribute readers for these. 🤔
Not sure if maybe we didn't at some point.
If there is a reason, we should put a comment above this line explaining it.
If we don't, should we switch to self.#{...} (to still avoid any possible name clashes with local vars)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. self.#{...} is probably better.

test/message_test.rb Outdated Show resolved Hide resolved
Co-authored-by: Randy Stauner <[email protected]>
lib/protoboeuf/decorated_field.rb Show resolved Hide resolved
result["value".to_sym] = @value

result[:"type_url"] = @type_url
result[:"value"] = @value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why that to_sym call was there to begin with, this looks a little funny to me. result[:type_url] and result[:value] would be much more natural.

Copy link
Contributor Author

@davebenvenuti davebenvenuti Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the quotes just in case the field names have anything funky in them that might produce a syntax error. But I'm not sure if that's even possible given the constraints on proto field names.


def to_json(options = {})
require "json"
JSON.dump(as_json(options))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename options to as_json_options please? I would otherwise think options applies to formatting the JSON output.

result = {}

result["intField"] = @int_field
result["stringField"] = @string_field if !options[:compact] ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth adding a helper method:

# fallback is probably the wrong name, but you get the idea

def should_compact?(options, fallback)
  options[:compact] || fallback
  
  # or maybe:
  # @compact_export_keys ||= options[:compact]
  # @compact_export_keys || fallback
end

I'm generally a fan of duplicating small fragments if it makes the code easier to read. But, in this case I think the lines are getting a bit long.

@davebenvenuti davebenvenuti deleted the davebenvenuti/hash-serialization branch January 31, 2025 21:09
@davebenvenuti
Copy link
Contributor Author

I got bit by this 😭 isaacs/github#361 I'll reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#gsd:43107 Ruby Language: RPC Code Generation for Shopify Internal APIs [GSD 43107]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants