Chinaunix首页 | 论坛 | 博客
  • 博客访问: 439808
  • 博文数量: 132
  • 博客积分: 2511
  • 博客等级: 大尉
  • 技术积分: 1385
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-11 15:10
文章分类

全部博文(132)

文章存档

2012年(18)

2011年(35)

2010年(60)

2009年(19)

分类: LINUX

2011-09-20 12:19:41

今天需要取两个文件的并集,自己写脚本处理太麻烦,一查,果然有现成的工具,comm。
需要注意的事,使用comm之前,两个文件都是必须是sort好了的。

以下内容转自
#####################################
In our work, we often encounter the following questions:

I have two files: file1 and file2:
1) How can I print out the lines that are only contained in file1?
2) How can I print out the lines that are only contained in file2?
3) How can I print out the lines that are contained both in file1 and file2?

There is a powerful shell command that can easily meet our needs, it is: comm. When you meet the above questions, "comm" should be your first choice:-)

comm [ -123 ]  file1  file2

comm will read file1 and file2 and generate three columns of output: lines only in file1; lines only  in file2; and lines in both files. For detailed explanation, pls man comm.

Example:

bash-2.03$ cat file1
11111111
22222222
33333333
44444444
55555555
66666666
77777777
88888888
99999999
bash-2.03$ cat file2
00000000
22222222
44444444
66666666
88888888

1)  Print out the lines that are only contained in file1?
bash-2.03$ comm -23 file1 file2
11111111
33333333
55555555
77777777
99999999

2)  Print out the lines that are only contained in file2?
bash-2.03$ comm -13 file1 file2
00000000

3)  Print out the lines that are contained both in file1 and file2
bash-2.03$ comm -12 file1 file2
22222222
44444444
66666666
88888888

Besides the comm, we still have various ways to finish the above tasks.

1)  Print out the lines that are only contained in file1?
diff file1 file2 | grep "^<"|sed 's/^< //g'

for i in $(>temp ; done;
      cat temp


In comparison, comm is much easier to remember. :-)

阅读(768) | 评论(0) | 转发(0) |
0

上一篇:实现scp自动上传文件

下一篇:犯错误了

给主人留下些什么吧!~~