三种遍历文件夹方法比较-火鸡-ChinaUnix博客

火鸡的笔记本

首页　| 　博文目录　| 　关于我

火鸡

博客访问： 1170548
博文数量： 221
博客积分： 10152
博客等级：上将
技术积分： 1518
用户组：普通用户
注册时间： 2005-07-22 10:42

文章分类

全部博文（221）

python（1）
Blender（2）
ham（7）
电子（6）
科学计算（2）
天文历法（16）
WEB server（3）
DNS（6）
MZ（0）
database（22）

mysql（19）
吹水（18）
mil（0）
Linux（137）

system（10）

embed（9）

Secure（11）

shell（36）

perl（2）

C（16）

develop（15）
未分配的博文（1）

文章存档

2018年（1）

2015年（6）

2014年（3）

2013年（4）

2012年（1）

2011年（5）

2010年（14）

2009年（10）

2008年（28）

2007年（33）

2006年（114）

2005年（2）

我的朋友

最近访客

推荐博文

三种遍历文件夹方法比较

分类：

2007-03-13 13:58:15

本贴对三种遍历文件夹方法比较。
1. 使用File::Find;
2. 递归遍历。(遍历函数为lsr)
3. 使用队列或栈遍历。(遍历函数为lsr_s)

1.use File::Find

CODE:

#!/usr/bin/perl -W
#
# File: find.pl
# Author:  路小佳
# License: GPL-2

use strict;
use warnings;
use File::Find;

my ($size, $dircnt, $filecnt) = (0, 0, 0);

sub process {
my $file = $File::Find::name;
#print $file, "\n";
if (-d $file) {
      $dircnt++;
}
else {
      $filecnt++;
      $size += -s $file;
}
}

find(\&process, '.');
print "$filecnt files, $dircnt directory. $size bytes.\n";

2. lsr递归遍历

CODE:

#!/usr/bin/perl -W
#
# File: lsr.pl
# Author: 路小佳
# License: GPL-2

use strict;
use warnings;

sub lsr($) {
sub lsr;
my $cwd = shift;

local *DH;
if (!opendir(DH, $cwd)) {
      warn "Cannot opendir $cwd: $! $^E";
      return undef;
}
foreach (readdir(DH)) {
      if ($_ eq '.' || $_ eq '..') {
         next;
      }
      my $file = $cwd.'/'.$_;
      if (!-l $file && -d _) {
         $file .= '/';
         lsr($file);
      }
      process($file, $cwd);
}
closedir(DH);
}

my ($size, $dircnt, $filecnt) = (0, 0, 0);

sub process($$) {
my $file = shift;
#print $file, "\n";
if (substr($file, length($file)-1, 1) eq '/') {
      $dircnt++;
}
else {
      $filecnt++;
      $size += -s $file;
}
}

lsr('.');
print "$filecnt files, $dircnt directory. $size bytes.\n";

3. lsr_s栈遍历

CODE:

#!/usr/bin/perl -W
#
# File: lsr_s.pl
# Author: 路小佳
# License: GPL-2

use strict;
use warnings;

sub lsr_s($) {
my $cwd = shift;
my @dirs = ($cwd.'/');

my ($dir, $file);
while ($dir = pop(@dirs)) {
      local *DH;
      if (!opendir(DH, $dir)) {
         warn "Cannot opendir $dir: $! $^E";
         next;
      }
      foreach (readdir(DH)) {
         if ($_ eq '.' || $_ eq '..') {
            next;
         }
         $file = $dir.$_;
         if (!-l $file && -d _) {
            $file .= '/';
            push(@dirs, $file);
         }
         process($file, $dir);
      }
      closedir(DH);
}
}

my ($size, $dircnt, $filecnt) = (0, 0, 0);

sub process($$) {
my $file = shift;
print $file, "\n";
if (substr($file, length($file)-1, 1) eq '/') {
      $dircnt++;
}
else {
      $filecnt++;
      $size += -s $file;
}
}

lsr_s('.');
print "$filecnt files, $dircnt directory. $size bytes.\n";

对我的硬盘/dev/hda6的测试结果。

1: File::Find

CODE:

26881 files, 1603 directory. 9052479946 bytes.

real 0m9.140s
user 0m3.124s
sys 0m5.811s

2: lsr

CODE:

26881 files, 1603 directory. 9052479946 bytes.

real 0m8.266s
user 0m2.686s
sys 0m5.405s

3: lsr_s

CODE:

26881 files, 1603 directory. 9052479946 bytes.

real 0m6.532s
user 0m2.124s
sys 0m3.952s

测试时考虑到cache所以要多测几次取平均, 也不要同时打印文件名，因为控制台是慢设备，会形成瓶颈。
lsr_s之所以用栈而不是队列来遍历，是因为Perl的push shift pop操作是基于数组的， push pop这样成对操作可能有优化。内存和cpu占用大小顺序也是1>2>3.

CODE:

                        CPU load       memory
use File::Find          97%             4540K
lsr                      95%             3760K
lsr_s                   95%             3590K

结论: 强烈推荐使用lsr_s来遍历文件夹。

=============再罗嗦几句======================
从执行效率上来看，find.pl比lsr.pl的差距主要在user上，原因是File::Find模块选项较多，条件判断费时较多，而lsr_s.pl比lsr.pl在作系统调用用时较少，是因为递归时程序还在保存原有的文件句柄和函数恢复现场的信息，所以sys费时较多。所以lsr_s在sys与user上同时胜出是不无道理的。

阅读(3899) | 评论(1) | 转发(0) |

上一篇：让MYSQL彻底支持中文

下一篇：scilab部分函数

给主人留下些什么吧！~~

chinaunix网友2010-07-17 11:20:52

执行时间是怎么记录出来的？

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6