Removing duplicate rows (SQL)
At least once in your career you will have to deal with duplicate rows causing havoc on you application. This post will help you get through that dilema. I have Oracle and MySql examples below.
ORACLE Specific
With Oracle with have the luxury of ROWID which uniquely identifies a row in an Oracle table. We will use this pseudo column to remove the duplicates.
Simple Method
This is the simplest method which removes the latest duplicate row added. If you want to remove the earliest you need to change MAX to MIN and replace “less than”(<) with “greater than”(>).
DELETE FROM [TABLE] A
WHERE ROWID < ( SELECT max(ROWID)
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]);
Another method with constraints
This is another method you can use if you have constraints. This method uses the Oracle function “Exists” which checks for the existence in a sub-query.
DELETE FROM [TABLE] A
WHERE CONTRAINT = [VARIABLE]
AND EXISTS ( SELECT ‘X’
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]
AND A.ROWID < B.ROWID);
MYSQL (These would also work in Oracle )
With MySQL we do not have the tools that Oracle has to easily find the duplicates. But most MySql tables use an auto-increment ID field that helps us to identify the duplicates.
Simple Method
As with the Oracle method use the “greater than” sign to to keep the earliest row entered. We can change this to pull the latest by replacing “greater than” with “less than”.
DELETE A FROM [TABLE] as A, [TABLE] as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID > B.ID;
Without the ID field
Now it gets complicated in MySQL, we need to create a table with an unique ID then remove the duplicates. Once the dups have been remove then we can put the data back onto the original table.
Drop the dups table if it exists.
DROP TABLE IF EXISTS [TABLE]_dups;
Create the dups table. This table will be the base table(table with dups) with the ID added.
CREATE TABLE [TABLE]_dups (
id INT(11) default NULL auto_increment,
[ALL TABLE COLUMNS],
PRIMARY KEY (id)
);Insert into the dups table from the base table.
INSERT INTO [TABLE]_dups
SELECT NULL,[UNIQUE FIELD(S)]
FROM [TABLE];Delete the dups using the SQL we used in the prior example
DELETE A FROM [TABLE]_dups as A, [TABLE]_dups as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID < B.ID;Delete the Base table
DELETE FROM [TABLE];
Insert into the base table from the dups table.
INSERT INTO [TABLE]
SELECT [all columns less the ID field]
FROM [TABLE]_dups;Remove the dups table
DROP TABLE [TABLE]_dups;
I hope this posts help you when you run into this problem… and we all run into this problem.



