Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Parquet Spec doesn't specify whether multiple columns are allowed to have the same name. #421

Open
asfimport opened this issue Sep 8, 2023 · 2 comments

Comments

@asfimport
Copy link
Collaborator

The parquet format specification doesn't say whether a Parquet file having columns with the same name (in the same group node, so really exactly the same name) is valid. I.e., say I have a Parquet file with two columns. Both are called x. Is this file a valid Parquet file?

Reporter: Jan Finis / @JFinis

Note: This issue was originally created as PARQUET-2345. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Gang Wu / @wgtmac:
I didn't find any statement to disallow identical field names in the parquet specs. For engines projecting columns on field ordinals or field ids, identical field names may not be a big issue. It is a good convention to avoid them.

@asfimport
Copy link
Collaborator Author

Micah Kornfield / @emkornfield:
I've at least seen in the wild two columns containing the same name only diverging by case sensitivity.  I agree we should recommend against them since its not clear they will be able to handled well.  If we do update docs, we should also recommend against naming columns using "." as a delimeter as this can also lead to ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant