Skip to content

Schema mismatch error when writing both partitioned and non-partitioned Parquet datasets #3046

Closed as not planned
@diegoxfx

Description

@diegoxfx

Describe the bug

When I attempt to write both a partitioned Parquet dataset and a non-partitioned Parquet file from the same data schema, I encounter a schema mismatch error. This occurs because partitioned writes exclude the partition columns from the Parquet file schema, while non-partitioned writes include them. Attempting one after the other leads to:

Table schema does not match schema used to create file:
table:
[schema without partition keys]
file:
[schema with partition keys]

How to Reproduce

import pandas as pd
import awswrangler as wr

df = pd.DataFrame({
    "merchant_id": [1, 2],
    "payout_type": ["X", "Y"],
    "execution_date": pd.to_datetime("2024-12-16"),
    "model_version": ["v1", "v1"]
})

# First write a non-partitioned file that includes partition keys as normal columns
wr.s3.to_parquet(
    df=df,
    path="s3://mybucket1/non_partitioned_file.parquet",
    dataset=False
)

# Then try writing a partitioned dataset (which excludes partition columns from the file schema)
wr.s3.to_parquet(
    df=df,
    path="s3://mybucket2/partitioned_dataset/",
    dataset=True,
    partition_cols=["execution_date", "model_version"]
)

The second call fails with a schema mismatch error. If you reverse the order of the calls (first the partitioned and then the non partitioned) also fails.

Expected behavior

The second call should write the data successfully without a schema mismatch error.

Your project

No response

Screenshots

No response

OS

Docker Container

Python version

3.11.8

AWS SDK for pandas version

3.10.1

Additional context

ChatGPT o1 says here's probably the cause of the bug:

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions