Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Glue - Null values in partition column #3165

Open
jmklix opened this issue Jan 3, 2024 · 4 comments
Open

AWS Glue - Null values in partition column #3165

jmklix opened this issue Jan 3, 2024 · 4 comments
Assignees
Labels
bug This issue is a bug. glue p3 This is a minor priority issue service-api This issue is due to a problem in a service API, not the SDK implementation.

Comments

@jmklix
Copy link
Member

jmklix commented Jan 3, 2024

Original discussion: #2803

It seems like glue isn't handling the getpartitions API correctly where a partition column has null value.
Example below, I am using the aws cli for simiplicity , which gives the same output as the SDK

My table data is structured as below in S3

s3://example-bucket/example_table/
├── int_partition_col=null/
│   ├── string_partition_col=null/
│   │   └── data-part-00001.csv
├── int_partition_col=1/
│   ├── string_partition_col=A/
│   │   └── data-part-00002.csv
└── int_partition_col=2/
    ├── string_partition_col=B/
    │   └── data-part-00003.csv

> aws glue get-partitions --database-name example_db --table-name example_table --expression "(int_partition_col >= 0)" ->
An error occurred (InvalidStateException) when calling the GetPartitions operation: For input string: "null" is not an integer.

> aws glue get-partitions --database-name example_db --table-name example_table --expression "(string_partition_col is null)" -> Returns empty

> aws glue get-partitions --database-name example_db --table-name example_table --expression "(string_partition_col = 'null')"-> works correctly

So it seems like the null value is being considered as a string literal? But from the documentation here, it seems IS NULL etc are supported?

@jmklix jmklix added the service-api This issue is due to a problem in a service API, not the SDK implementation. label Jan 3, 2024
@jmklix jmklix self-assigned this Jan 3, 2024
@jmklix jmklix added the bug This issue is a bug. label Jan 3, 2024
@jmklix
Copy link
Member Author

jmklix commented Jan 3, 2024

P111656246

@kambhamvivekshankar
Copy link

I too am facing same issue. Can this ticket be prioritized. Also delta seems to be writing null as __HIVE_DEFAULT_PARTITION__. Can we include native support for this as well.

@kambhamvivekshankar
Copy link

__HIVE_DEFAULT_PARTITION__ is a common use case many query engines support. https://cwiki.apache.org/confluence/display/hive/configuration+properties#ConfigurationProperties-hive.exec.default.partition.name

@zshzbh zshzbh transferred this issue from aws/aws-sdk Oct 30, 2024
@jmklix jmklix added p3 This is a minor priority issue glue labels Oct 30, 2024
@DmitriyMusatkin
Copy link
Contributor

this does not seem like a cpp issue if that the modeling issue? or are there details missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. glue p3 This is a minor priority issue service-api This issue is due to a problem in a service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

3 participants