-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: increase default type precision in redshift/mssql view workaround #2791
base: main
Are you sure you want to change the base?
Conversation
8138a6a
to
71d76af
Compare
9fcc19f
to
074c1a9
Compare
45a20f2
to
658ed8b
Compare
) -> t.Dict[str, exp.DataType]: | ||
# get default lengths for types that support "max" length | ||
types_with_max_default_param = { | ||
k: [self.SCHEMA_DIFFER.parameterized_type_defaults[k][0][0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we care about all types. I think we only care about Varchar
don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this simplify the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we think the same issue applies to types VARBINARY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it could happen for any of these types due to transpilation, but VARCHAR is much more widely used than NVARCHAR/CHAR/VARBINARY so in practice it's probably the only one that matters. I'll update.
658ed8b
to
b719b06
Compare
parameter = self.SCHEMA_DIFFER.get_type_parameters(col_type) | ||
type_default = types_with_max_default_param[col_type.this] | ||
if parameter == type_default: | ||
col_type.set("expressions", [exp.DataTypeParam(this=exp.Var(this="max"))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col_type.set("expressions", [exp.DataTypeParam(this=exp.Var(this="max"))]) | |
col_type.set("expressions", [exp.DataTypeParam(this=exp.var("max"))]) |
Redshift and MSSQL use the
VarcharSizeWorkaroundMixin
to infer the size of a varchar column when it cannot be resolved from model code.Data may be truncated if it is inferred to the default length (256 in Redshift, 1 in MSSQL) and that was not intended by the user, so this PR increases the length to "max" if a column with the default length is detected.
The most likely cause of unintentional default length is transpilation from an engine where the bare type's default is unlimited. For example, in postgres
VARCHAR
without length parameter has unlimited length, so transpilation to RedshiftVARCHAR
with default length 256 results in unintentionally truncating the field.