-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For enum values starting with a digit, prefix the generated symbol with an underscore. #3961
base: main
Are you sure you want to change the base?
Conversation
Hi @deven, thanks for the contribution! I have kicked off the CI tests and expect them all to pass. Would it be possible to get a test added for this to ensure that it isn't reverted at some later point and maybe a comment on the regex replacements just to indicate their purpose? |
It is unclear which error is this PR trying to resolve. Enum variant names follow the same rules as identifiers in Smithy, and Smithy does not allow identifiers to begin with a digit. For example, this is invalid in Smithy:
However, this restriction does not apply to enum values. For example, the following is perfectly valid and will not raise any errors:
|
Will do, thanks! |
I tried to keep it simple, but I guess I need to walk through the gory details. I've been working on creating AWS-style Rust SDKs for various sections of Amazon SP-API, using a custom script I wrote to translate the original Swagger IDL files into Smithy IDL files, then building the SDKs using In this particular case, I was working on the Fulfillment Inbound API. Here is one of the enums in the original
My script translated this Swagger enum into the following Smithy enum in the generated Smithy IDL file:
Of course, simply using
This exception is generated because "2DBarcode" is not a valid identifier because of the leading digit. However, the exact string "2DBarcode" was NOT in the Smithy IDL file at all! Some algorithm inside My commit uses the following code fragment to fix the invalid generated symbols starting with a digit:
This will prefix the symbol name with an underscore if it begins with a digit, making the generated symbol valid. To guarantee that this modification can't conflict with another enum variant, it also matches any number of leading underscores before that leading digit. Otherwise, prefixing Originally, I only added this code fragment to
When I investigated, I found that the generated Rust code had the following enum definition:
Since I needed to investigate the
This now contains a valid enum variant name ( |
Have you tried changing your original transformation to not use underscore as a prefix and something like |
@deven Thanks for the detailed explanation. @aajtodd I agree that not appending enum Dimensions {
_2D = "2D"
_3D = "3D"
} However, I would like to dig a bit deeper to understand why we need to use |
For the record, using |
e16a2af
to
c14bf4d
Compare
…th an underscore.
c14bf4d
to
fbd70dd
Compare
Hi @landonxjames, I've replaced the original commit with a newer one which includes comments and test cases. |
So, could we get this merged please? 😃 |
Thank you for your patience regarding this PR review. I've identified some issues with the current implementation in The fundamental problem lies in the processing order of the identifier transformation. Currently, we apply val name = definition.name.orNull()?.toPascalCase()?.replace(Regex("^(_*\\d)"), "_$1") ?: return null This approach is problematic for the following reasons:
@enum([
{
value: "2D_BARCODE",
name: "_2D_BARCODE",
documentation: "Example with leading digit."
},
{
value: "2D_SPECIAL_BARCODE",
name: "__2D_BARCODE",
documentation: "Example with more underscores"
},
]) This generates invalid Rust code with duplicate variant names: pub enum BarCodes {
/// Example with leading digit.
///
/// _Note: `::2DBarcode` has been renamed to `::_2DBarcode`._
_2DBarcode,
/// Example with leading digit.
///
/// _Note: `::2DBarcode` has been renamed to `::_2DBarcode`._
_2DBarcode,
}
Proposed Solutions:
a) Implement proper prefix preservation (something like the following): Note: We specifically use val name = definition.name.orNull()?.let { input ->
Regex("^(_+\\d+)(.*)$").find(input)?.let { match ->
val (prefix, rest) = match.destructured
prefix + rest.toPascalCase()
} ?: input.toPascalCase()
} ?: return null b) Consider the broader implications of underscore preservation: There's a larger architectural question about whether we should preserve all leading underscores, regardless of whether they're followed by digits. While it might seem logical to handle all underscore prefixes uniformly, we're intentionally limiting this fix to underscores followed by digits for backward compatibility. We can be confident this won't break existing models because any model with a digit-prefixed identifier would have generated invalid Rust code that wouldn't compile, making it impossible for such models to be in use successfully. However, this limited approach doesn't address all edge cases. Consider this valid Smithy model: @enum([
{
value: "3DBARCODE",
name: "_Three",
documentation: "Example 3D Bar code"
},
{
value: "3D_BAR_CODE",
name: "Three",
documentation: "Example 3D Bar code with a leading underscore"
},
])
string BarCodes This still generates invalid Rust code because two enum variants differ only by a leading underscore. Ideally, we would like to detect and stop code generation with an error rather than generate uncompilable Rust code. While this is a valid concern, we've decided to address this larger issue in a separate PR to keep the current changes focused and manageable. c) Optimize regex performance by declaring it as a companion object member: class EnumMemberModel(...) {
private val UNDERSCORE_PREFIX_PATTERN = Regex("^(_+\\d+)(.*)$")
}
The identifier transformation logic should be moved out of I will follow up with a specific recommendation for the new location of this logic after further analysis of the codebase architecture. |
I was already prefixing the digit with an underscore in the Smithy IDL file to make the Smithy identifier valid:
Because the I don't know what else this might impact, but an alternative approach would be to have fun String.toPascalCase(): String {
// TODO(https://github.com/smithy-lang/smithy-rs/issues/3047): consider using our updated toSnakeCase (but need to audit diff)
val match = Regex("^(_*)(.*)$").find(this)!!
val prefix = match.groupValues[1]
val remaining = match.groupValues[2]
return prefix + CaseUtils.toPascalCase(CaseUtils.toSnakeCase(remaining))
} I tested this solution and confirmed that it converts For consistency, it would make sense to apply the same logic to the fun String.toSnakeCase(): String {
val match = Regex("^(_*)(.*)$").find(this)!!
val prefix = match.groupValues[1]
val remaining = match.groupValues[2]
return prefix + remaining.splitOnWordBoundaries().joinToString("_") { it.lowercase() }
} Both functions are in this file: This would likely be a reasonable approach. There is some chance of backward compatibility issues since existing generated Rust code with leading underscores in the Smithy IDL file would generate different identifiers with this code change than they currently do. On the other hand, Smithy explicitly discourages leading underscores like this, so it might not be a problem:
If the backward compatibility is a significant concern, the approach above could be slightly modified to only preserve leading underscores when followed by a digit by using |
Should I replace my current patch in this pull request with the alternative that I suggested above, to fix |
Motivation and Context
If an enum uses values which start with a digit, the generated symbols are invalid, causing both Smithy errors and Rust errors.
Description
Prefixes invalid symbols starting with a digit with underscores to make the symbols valid.
Testing
This commit fixes the Smithy error that I was getting for an enum using a value of "2DBarcode", and also fixes the generated Rust code to use "_2DBarcode" as the enum value instead of "2DBarcode".
Checklist
.changelog
directory, specifying "client," "server," or both in theapplies_to
key..changelog
directory, specifying "aws-sdk-rust" in theapplies_to
key.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.