You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A more common pattern would be to select the distinct set of values you want to insert using pandas DataFrame.drop_duplicates() method and then iterate through the list of data inserting each row or preferably doing a bulk upsert.
In addition to requiring multiple passes through the dataset with $On^k$ complexity (where $n</code> is the size of the source DataFrame and $<code>k$ is the number of unique IDs for a given table) vs a single pass to retrieve all data we want to insert, this pattern also makes it harder to adopt bulk upserts.
Bulk transactions are helpful because they allow us to combine DML into transaction blocks more easily and prevent instances in which tables are only partially updated during a batch data load process.
Acceptance criteria
We've reduced the number of passes through the dataframe needed to extract the unique set of values to insert into the database
The data is loaded using bulk DML statements rather than separate insert/update statements for each row of data.
The text was updated successfully, but these errors were encountered:
Summary
Currently, the pattern adopting for upserting data in
etldb/main.py
is to:See the following sections for examples:
A more common pattern would be to select the distinct set of values you want to insert using pandas DataFrame.drop_duplicates() method and then iterate through the list of data inserting each row or preferably doing a bulk upsert.
In addition to requiring multiple passes through the dataset with$On^k$ complexity (where $n</code> is the size of the source DataFrame and $<code>k$ is the number of unique IDs for a given table) vs a single pass to retrieve all data we want to insert, this pattern also makes it harder to adopt bulk upserts.
Bulk transactions are helpful because they allow us to combine DML into transaction blocks more easily and prevent instances in which tables are only partially updated during a batch data load process.
Acceptance criteria
The text was updated successfully, but these errors were encountered: