重复行的删除-huaihe0410-ChinaUnix博客

huaihe0410

首页　| 　博文目录　| 　关于我

huaihe0410

博客访问： 1420046
博文数量： 247
博客积分： 10147
博客等级：上将
技术积分： 2776
用户组：普通用户
注册时间： 2008-01-24 15:18

文章分类

全部博文（247）

svn（1）
AIX（1）
协议（2）
编码（8）
测试（10）
编译（0）
python（22）

socket（1）

中文字符（4）

smtp（1）
resin/java（3）
jsp（2）
其他（3）
mysql（22）

cluster（3）
linux/unix（68）

linux性能指令（5）

磁盘（3）

cvs（4）

shell（6）

指令（19）

网络（8）
oracle（92）

oracle字符集（6）

PL/SQL（1）

Oracle9i初始化参（15）

oracle 并行（2）

oracle1011新特性（6）

oracle函数（5）

oracle索引组织表（4）

oracle分区表（9）

oracle性能优化（21）
未分配的博文（13）

文章存档

2013年（11）

2012年（3）

2011年（20）

2010年（35）

2009年（91）

2008年（87）

我的朋友

jiayanfu

相关博文

重复行的删除

分类： LINUX

2008-06-02 20:05:51

要统计各种数据文件，若干记录是否在出现在，大日志文件里，或是jcl，统计某个记录的条数，连接shell,处理为原始的文本数据（从数据库来），操控数据库，shell调用sqlplus,执行sql,perl DBI连接oracle,自动建立目录，消除重复行,排序，等等，用awk,shell,sed,grep,perl乱七八糟的。

发现Perl单独就可以把上面的工作基本全都做了，只要你不嫌麻烦代码。Perl真的挺好玩了，特别是用Perl写的相对比较复杂的数据结构，还有OO

的东西。

统计数据，要把一个文件里重复的记录删除，看了一眼网上给的答案，大体上就是，排序，之后用uniq,或awk

awk '{if ($0!=line) print;line=$0}' file

一位达人用sed写的版本，如下：

sed -f rmdup.sed yourfile

here is the rmdup.sed sed script:

#n rmdup.sed - ReMove DUPlicate consecutive lines

# read next line into pattern space (if not the last line)

$!N

# check if pattern space consists of two identical lines

s/^$.*$\n\1$/&/

# if yes, goto label RmLn, which will remove the first line in pattern space

t RmLn

# if not, print the first line (and remove it)

# garbage handling which simply deletes the first line in the pattern space

: RmLn

use `sort' first. there is no EFFICIENT way of sorting in sed/awk

阅读(1336) | 评论(0) | 转发(0) |

上一篇：用 uniq 除去重复行（Shell技巧1）

下一篇：vi命令详解

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6