This article brings you relevant knowledge about Oracle, which mainly introduces the duplicate data in the table that is often cleared during data cleaning. So how to deal with it in Oracle? Let’s take a look at it together, I hope it will be helpful to everyone.
Recommended tutorial: "Oracle Video Tutorial"
Create test data
create table nayi224_180824(col_1 varchar2(10), col_2 varchar2(10), col_3 varchar2(10));
insert into nayi224_180824
select 1, 2, 3 from dual union all
select 1, 2, 3 from dual union all
select 5, 2, 3 from dual union all
select 10, 20, 30 from dual ;
commit;
select*from nayi224_180824;
COL_1 |
COL_2 |
COL_3 |
1 |
2 |
3 |
1 |
2 |
3 |
5 |
2 |
3 |
10 |
20 |
30 |
for the specified Column, check the result set after deduplication
distinct
select distinct t1.* from nayi224_180824 t1;
##COL_1 | COL_2 | COL_3 |
10 | 20 | 30 |
1 | 2 | 3 |
5 | 2 | 3 |
The method is very limited because it can only deduplicate all query columns. If I want to deduplicate col_2 and col3, then my result set can only have col_2 and col_3 columns, but not col_1.
select distinct t1.col_2, col_3 from nayi224_180824 t1
COL_2 | COL_3 |
##2
3 |
|
20
30 |
|
But it is also the simplest and easiest way to understand.
row_number()
select *
from (select t1.*,
row_number() over(partition by t1.col_2, t1.col_3 order by 1) rn
from nayi224_180824 t1) t1
where t1.rn = 1
;
COL_1
COL_2 |
COL_3 |
RN | |
1
2 |
3 |
1 |
|
10
20 |
30 |
1 |
|
##It’s a lot more troublesome to write, but it has greater flexibility .
For the specified column, find all duplicate rows
count having
select *
from nayi224_180824 t
where (t.col_2, t.col_3) in (select t1.col_2, t1.col_3
from nayi224_180824 t1
group by t1.col_2, t1.col_3
having count(1) > 1)
COL_1COL_2 | COL_3 |
|
1
2 | 3 |
| 1
2 | 3 |
| 5
2 | 3 |
|
If you need to check the table twice, the efficiency will be relatively low. Not recommended.
count over
select *
from (select t1.*,
count(1) over(partition by t1.col_2, t1.col_3) rn
from nayi224_180824 t1) t1
where t1.rn > 1
;
COL_1COL_2 | COL_3 | RN |
|
1
2 | 3 | 3 |
| 1
2 | 3 | 3 |
| 5
2 | 3 | 3 |
|
#You only need to check the table once, recommended.
Delete all duplicate rows
delete from nayi224_180824 t
where t.rowid in (
select rid
from (select t1.rowid rid,
count(1) over(partition by t1.col_2, t1.col_3) rn
from nayi224_180824 t1) t1
where t1.rn > 1);
The above statement is slightly modified.
Delete duplicate data and retain one
Analytical function method
delete from nayi224_180824 t
where t.rowid in (select rid
from (select t1.rowid rid,
row_number() over(partition by t1.col_2, t1.col_3 order by 1) rn
from nayi224_180824 t1) t1
where t1.rn > 1);
has the consistent high flexibility of analytical functions. You can do whatever you want with the grouping and change the orderby clause to achieve requirements like "retain the maximum id".
group by
delete from nayi224_180824 t
where t.rowid not in
(select max(rowid) from nayi224_180824 t1 group by t1.col_2, t1.col_3);
Sacrifice some flexibility in exchange for higher efficiency.
Recommended tutorial: "
Oracle Video Tutorial
"
The above is the detailed content of Summarize and organize common methods for removing duplicate data from Oracle database. For more information, please follow other related articles on the PHP Chinese website!