Oracle: Deleting Duplicate Rows Efficiently
Finding duplicates with RANK
If you’ve made the mistake of forgetting a primary key on your table, it can be frustrating to find a way to delete all of the duplicate rows without deleting the initial instances.
delete from $table_name where rowid in
(
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by $primary_key order by rowid) rank_n, rowid as "rowid"
from $table_name
where $primary_key in
(select $primary_key from $table_name
group by $all_columns
having count(*) > 1
)
)
)
where rank_n > 1
)
This query selects all of the 'extra' rowids and removes them. It is especially designed for limiting the query scans to only those records which have duplicates, which is useful if there's only a subset of the table that you are dealing with. If you want to improve its efficiency for a table with a high percentage of duplicates, simply remove the inside where clause. In a sample test with 1% duplicates and 233 000 rows, the query took 22 seconds without the subquery, and 18 seconds with it.
* NOTE: This query may not work in Oracle versions before 8i.
A quick way to create duplicates:
create table $temp_table as select * from $table_name sample(10);
insert into $table_name select * from $temp_table;
Alternative methods
GROUP BY method
DELETE FROM $table_name
WHERE rowid not in
(SELECT MIN(rowid)
FROM $table_name
GROUP BY $all_columns;
Join with self method
SELECT
rowid
FROM
$table_name A
WHERE
rowid >
(SELECT min(rowid) FROM $table_name B
WHERE
A.$column1 = B.$column1
and A.$column2 = B.$column2
......
);
Comparison
In the test case, I used a table with 233 000 rows created with a 1% duplication factor. You'll notice that I interrupted the execution of the 'join with self' because I didn't have the patience to let it run to completion.
The following is a TKPROF output of the queries:
TKPROF: Release 8.1.7.4.0 - Production on Tue May 23 14:47:43 2006
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Trace file: flpdt_ora_28183.trc
Sort options: default
********************************************************************************
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by token_id order by rowid) rank_n, rowid as "rowid"
from $schema.$table_name
where token_id in
(select token_id from $schema.$table_name
group by tournament_id, nickname, bank, place, prize_id, userid, token_id
having count(*) > :"SYS_B_0"
)
)
)
where rank_n > :"SYS_B_1"
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.01 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 315 29.98 73.11 14539 7578 94 4662
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 321 29.98 73.12 14539 7578 94 4662
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS
Rows Row Source Operation
------- ---------------------------------------------------
0 VIEW
0 WINDOW SORT
0 MERGE JOIN
230731 SORT JOIN
230730 TABLE ACCESS FULL $table_name
0 SORT JOIN
0 VIEW VW_NSO_1
0 SORT UNIQUE
0 FILTER
230731 SORT GROUP BY
230730 TABLE ACCESS FULL $table_name
********************************************************************************
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by token_id order by rowid) rank_n, rowid as "rowid"
from $schema.$table_name
)
)
where rank_n > :"SYS_B_0"
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 157 7.67 15.47 2635 1257 15 2337
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 159 7.67 15.50 2635 1257 15 2337
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS
********************************************************************************
SELECT MIN(rowid)
FROM $schema.$table_name
GROUP BY tournament_id, nickname, bank, place, prize_id, userid, token_id
having count(*) > :"SYS_B_0"
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 315 16.74 41.81 7832 3897 53 4662
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 321 16.74 41.81 7832 3897 53 4662
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS
Rows Row Source Operation
------- ---------------------------------------------------
0 FILTER
230731 SORT GROUP BY
230730 TABLE ACCESS FULL $table_name
********************************************************************************
SELECT
rowid
FROM
$schema.$table_name A
WHERE
rowid >
(SELECT min(rowid) FROM $schema.$table_name B
WHERE
B.tournament_id = A.tournament_id
and B.nickname = A.nickname
and B.bank = A.bank
and B.place = A.place
and B.prize_id = A.prize_id
and B.userid = A.userid
and B.token_id = A.token_id
)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.02 0.01 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 153.83 213.12 933 436118 2104 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 9 153.85 213.13 933 436118 2104 0
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS
Rows Row Source Operation
------- ---------------------------------------------------
0 FILTER
30 TABLE ACCESS FULL $table_name
58 SORT AGGREGATE
3 TABLE ACCESS FULL $table_name
Category: Group Blog Posts
Tags:

Why not simply
DELETE table_name
WHERE rowid IN
( SELECT LEAD(rowid) OVER (PARTITION BY pk ORDER BY whatever)
FROM table_name );
I like using the LEAD function, and it works well for removing duplicates, but you need to be sure that every row has exactly one duplicate. In my test scenario, it performed poorly because I had only a subset of duplicates. In such systems, you would need to narrow the results to only problem records and also factor in the possibility of one record having multiple duplicates.
Hi,
Is it possible to delete alternate rows from a table?
If yes, can some one send me the query?
Thanks
Abhijeet.
Select alteranate rows
select * from Table where rowid%2=0;