Let's say I have a mailing_list
table that contains all the email addresses
that I want to send a mailing out to. Without a uniqueness constraint on the
email
column, I can end up with multiple records containing the same email
address — duplicates.
Here are a couple queries for checking to see if any duplicate records exist and which ones they are.
select email
from (
select
email,
row_number() over (
partition by email
order by email
) as row_num
from mailing_list
) t
where t.row_num > 1;
This is cool because it uses a window
function,
specifically the
row_number()
window function, to assign an incrementing number to each row in the partition.
Here is another, conceptually simpler approach.
select
email
count(*)
from mailing_list
group by email
having count(*) > 1
order by email;
Though we cannot use a where
clause with an aggregate (count
), we can reach
for a having
clause to grab only those results where we've found more than
1
— duplicates.