去掉表里组合字段重复的记录-x5f4T4r-ChinaUnix博客

mysql乐园(linux)67248

首页　| 　博文目录　| 　关于我

x5f4T4r

博客访问： 646096
博文数量： 825
博客积分： 5000
博客等级：大校
技术积分： 4980
用户组：普通用户
注册时间： 2008-10-27 14:19

文章分类

全部博文（825）

未分配的博文（825）

文章存档

2011年（1）

2008年（824）

我的朋友

最近访客

推荐博文

去掉表里组合字段重复的记录

分类：

2008-10-27 14:26:39

当设计表的时候没有建组合字段唯一约束，以后需要增加这一约束时，却发现表里已经有了很多重复记录了。

请看看我用的去掉表里组合字段重复的记录方法：

假设原始表名为source_table,字段名1为field_name1,字段名2为field_name2。

（当然稍加修改也可以用到三个及以上组合字段重复的情况）

        第一步: 生成组合字段重复的临时表source_dup_simple
        create table source_dup_simple
        nologging
        pctfree 1 pctused 99
        as select field_name1,field_name2,count(0) as num from source_table
        group by field_name1,field_name2 having count(0)>1;

        第二步: 生成组合字段重复的主表里完整记录的临时表source_dup
        create table source_dup
        nologging
        pctfree 1 pctused 99
        as select t1.* from source_table t1,source_dup_simple t2
        where t1.field_name1=t2.field_name1 and t1.field_name2=t2.field_name2;

        第三步: 删去source_dup里的重复记录

        --可选择:保留rowid小的记录
        delete from source_dup a where rowid > (
        select min(rowid) from source_dup b where
        a.field_name1 = b.field_name1 and a.field_name2=b.field_name2);
        commit;

        --可选择:保留rowid大的记录
        delete from source_dup a where rowid < (
        select max(rowid) from source_dup b where
        a.field_name1 = b.field_name1 and a.field_name2=b.field_name2);
        commit;

注意:如果操作一万条以上的记录最好在source_dup的field_name1和field_name2字段上建索引.

如果想按别的删除规则，如保留日期最新的记录：

--可选择:保留时间字段date_field大的记录

             delete from source_dup a where date_field < (
                select max(date_field) from source_dup b where
                a.field_name1 = b.field_name1 and a.field_name2=b.field_name2);
             commit;

--可选择:保留时间字段date_field小的记录

             delete from source_dup a where date_field > (
                select min(date_field) from source_dup b where
                a.field_name1 = b.field_name1 and a.field_name2=b.field_name2);
             commit;

如果时间字段上有重复,还需要再次根据rowid来删一次

             delete from source_dup a where rowid < (
                select max(rowid) from source_dup b where
                a.field_name1 = b.field_name1 and a.field_name2=b.field_name2);
             commit;

        第四步: 删去所有重复组合字段原始表里记录
        delete from source_table
        where field_name1||field_name2 in (select field_name1||field_name2 from source_dup_simple);
        commit;

注意:如果操作一万条以上的记录最好在source_table的field_name1和field_name2字段上建索引.

        第五步: 把剩下的没有重复的记录插回原始表
        insert into source_table select * from source_dup;
        commit;

【责编:Amy】

--------------------next---------------------

阅读(282) | 评论(0) | 转发(0) |

上一篇：SQL优化之操作符篇

下一篇：sqlPlus中的空值

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6