Skip to content

[FEATURE]: Primary Key detection when comparing datasets #484

@mwojtyczka

Description

@mwojtyczka

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

The current compare_datasets check requires providing primary keys. However, this is not always obvious or known, especially in migration projects.

Proposed Solution

Provide an automated solution for discovering technical primary keys (column or combination of columns that uniquely identify each row in the input DataFrame). Add new parameter: auto_detect_matching_keys. The search space should be limited to the provided columns and ref_columns.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions