05-查找重复文件-lshdcr-ChinaUnix博客

稻草人的博客

首页　| 　博文目录　| 　关于我

lshdcr

博客访问： 279443
博文数量： 103
博客积分： 0
博客等级：民兵
技术积分： 705
用户组：普通用户
注册时间： 2013-05-02 16:15

文章分类

全部博文（103）

apache（1）
php（4）
django（0）
linux（9）
shell（8）
mysql（18）
jQuery（0）
javascript（23）
python（40）
未分配的博文（0）

文章存档

2014年（8）

2013年（95）

我的朋友

相关博文

05-查找重复文件

分类： Python/Ruby

2013-05-03 11:36:17

以MD5校验和的方式比较文件，遍历目录，将文件放入record={}，在遍历过程中如果发现有相同的文件，则将相同的文件放入dup=[]

import hashlib
def checksum(file):
fp=open(file)
checksum=hashlib.md5()
while True:
    buffer=fp.read(8192)
    if not buffer:break
    checksum.update(buffer)
fp.close()
checksum=checksum.digest()
return checksum

import os
def diskwalk(path):
fullpath=[]
for paths,dirs,files in os.walk(path):
    for file in files:
      filepath=os.path.join(paths,file)
      fullpath.append(filepath)
return fullpath

def getsize(file):
size=os.stat(file)[6]
return size

path='/opt/python'
files=diskwalk(path)
dup=[]

for file in files:
compound_key=(getsize(file),checksum(file))
if compound_key in record:
    dup.append(file)
else:
    record[compound_key]=file
print record
print "###############"
print dup

阅读(1331) | 评论(0) | 转发(0) |

上一篇：04-hashlib-md5校验和比较两个文件是否相等

下一篇：07-python-mysql

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6