Handling duplicate rows in the source data #198

YemreGurses · 2024-07-05T12:32:12Z

Sometimes, there is no unique id in the data source. In this case, we can generate an id by combining and hashing the data in columns of that row.

In some scenarios, even if we combine the data in the columns, there may be duplicate rows in the data source. For those situations, we can add a condition to check duplicate ids for each batch (if not in the same batch, the resource with the same id is updated because we are using PUT) and eliminate the duplicate ones.
And we can log the duplicate rows for better identification.

YemreGurses mentioned this issue Sep 25, 2024

Bug related to execution monitoring (EFK) #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling duplicate rows in the source data #198

Handling duplicate rows in the source data #198

YemreGurses commented Jul 5, 2024

Handling duplicate rows in the source data #198

Handling duplicate rows in the source data #198

Comments

YemreGurses commented Jul 5, 2024