Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2898275
  • 博文数量: 599
  • 博客积分: 16398
  • 博客等级: 上将
  • 技术积分: 6875
  • 用 户 组: 普通用户
  • 注册时间: 2009-11-30 12:04
个人简介

WINDOWS下的程序员出身,偶尔也写一些linux平台下小程序, 后转行数据库行业,专注于ORACLE和DB2的运维和优化。 同时也是ios移动开发者。欢迎志同道合的朋友一起研究技术。 数据库技术交流群:58308065,23618606

文章分类

全部博文(599)

文章存档

2014年(12)

2013年(56)

2012年(199)

2011年(105)

2010年(128)

2009年(99)

分类: Mysql/postgreSQL

2011-08-21 12:18:52

How can I Find or Delete Duplicated Rows in a Table in MySQL? [ID 1023312.1]

 

 

 

 

Applies to:

MySQL Server - Version: 4.0and later   [Release: 4.0 and later ]
All Platforms

Goal

Tips for finding or deleting duplicated rows in a table in MySQL. 

Solution

This document describes how to find and optionally delete duplicate rows.

 

Finding duplicate rows

Finding duplicate rows in a large table can be difficult. Duplicate rows could include a table containing two or more rows in which all of their columns have the same values. This could be in a table in which there is no key column. Or it could be that there is a key column and it's different, but all of the other columns match or perhaps just a few significant columns. How duplicate rows are identified is up to you. The problem is how to easily find duplicates based on your definition.

If there are too many rows in a table to efficiently inspect the table manually, you can find them with the COUNT() function. As an example, suppose you have a table containing the names and telephone numbers of sales contacts. Suppose further that there are duplicate entries for some contacts. To find duplicate rows in a table based on a couple of columns in particular, you can enter an SQL statement like the following:
 
SELECT name_first, name_last, COUNT(*) AS tally,
FROM contacts
GROUP BY name_first, name_last
HAVING tally > 1;

This statement will list one row for each contact for which there is at least one duplicate entry (having the same first and last name) and give a tally of the number of entries for the contact.

Deleting duplicate rows

If a table has rows which are completely the same, developers sometimes have difficulty instructing MySQL to delete one row, but not both duplicate rows. For example, suppose you have a table containing duplicate rows, rows in which all of their columns have the same value as the duplicate entries. This could be in a table in which there is not a key column, so all columns could match between the duplicate rows. If you want to delete the duplicates, you can do so with the DELETE statement and a LIMIT clause.

Solution
 
If you have a table in which there is not a key column and all columns match between the duplicate rows, and you know which rows they are, you can enter SQL statements like the following:

SELECT COUNT(*)
   FROM table1
   WHERE col1='text';
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+
DELETE FROM table1
   WHERE col1 = 'text'
   LIMIT 3;

This is assuming you have manually inspected the data to determine that there are duplicate records for the condition given in the WHERE clause. A SELECT statement is then run with the COUNT() function to count the number of duplicate entries based on the condition in which the duplicates were first detected. This is followed by a DELETE statement with a LIMIT clause with a count one less than the number of rows found. This way one will be kept.
 
If you are considering some rows to be duplicates based on certain columns, but not all columns, then you may not want to randomly delete rows as shown in the previous SQL statement. Of course, if one of the columns is different, you could use that column in the WHERE clause of the DELETE statement to determine which rows to delete. An alternative to random or specific row deletion would be to add an ORDER BY clause to the DELETE statement shown above. For example, suppose that the rows contain a datetime column. You could order the rows in descending order by the datetime column like so:
 
DELETE FROM table1
   WHERE col1 = 'text'
   ORDER BY date_col DESC
   LIMIT 3;
 
This will keep the row with the oldest date and delete the others. The ORDER BY clause can be used with DELETE beginning with MySQL 4.0.0.
If you have an AUTO_INCREMENT column, or if you can add one to your table, you may also be able to delete duplicates via a self-join:
 
DELETE table_a
   FROM table1 AS table_a
   INNER JOIN table1 AS table_b
   ON table_a.col1 = table_b.col1
      AND table_a.id < table_b.id;
阅读(1760) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~