Finding duplicate records in SQL Server
1. The GROUP BY and HAVING Method
This is the most standard approach. It is best used when you only need to identify which values are duplicated and how many times they appear.
SELECT Email, COUNT(*) as Count
FROM Employee
GROUP BY Email
HAVING COUNT(*) > 1;
2. Using a Common Table Expression (CTE)
A CTE makes the ROW_NUMBER logic much cleaner and is the standard way to delete duplicates. Since a CTE is a "live" view of the data, deleting from the CTE deletes from the underlying table.
SQLWITH DuplicateCTE AS (
SELECT Email,
ROW_NUMBER() OVER(PARTITION BY Email ORDER BY EmployeeID) as RowNum
FROM Employee
)
-- To view duplicates:
SELECT * FROM DuplicateCTE WHERE RowNum > 1;
-- To delete duplicates:
-- DELETE FROM DuplicateCTE WHERE RowNum > 1;
3. Using Dense Rank
WITH RankedDuplicates AS (
SELECT
EmployeeID,
Email,
DENSE_RANK() OVER (ORDER BY Email) as GroupID,
COUNT(*) OVER (PARTITION BY Email) as OccurrenceCount
FROM Employee
)
SELECT *
FROM RankedDuplicates
WHERE OccurrenceCount > 1
ORDER BY GroupID;