今天需要取两个文件的并集,自己写脚本处理太麻烦,一查,果然有现成的工具,comm。
需要注意的事,使用comm之前,两个文件都是必须是sort好了的。
以下内容转自
#####################################
In our work, we often encounter the following questions:
I have two files: file1 and file2:
1) How can I print out the lines that are only contained in file1?
2) How can I print out the lines that are only contained in file2?
3) How can I print out the lines that are contained both in file1 and file2?
There is a powerful shell command that can easily meet our needs, it is: comm. When you meet the above questions, "comm" should be your first choice:-)
comm [ -123 ] file1 file2
comm will read file1 and file2 and generate three columns of output:
lines only in file1; lines only in file2; and lines in both files. For
detailed explanation, pls man comm.
Example:
bash-2.03$ cat file1
11111111
22222222
33333333
44444444
55555555
66666666
77777777
88888888
99999999
bash-2.03$ cat file2
00000000
22222222
44444444
66666666
88888888
1) Print out the lines that are only contained in file1?
bash-2.03$ comm -23 file1 file2
11111111
33333333
55555555
77777777
99999999
2) Print out the lines that are only contained in file2?
bash-2.03$ comm -13 file1 file2
00000000
3) Print out the lines that are contained both in file1 and file2
bash-2.03$ comm -12 file1 file2
22222222
44444444
66666666
88888888
Besides the comm, we still have various ways to finish the above tasks.
1) Print out the lines that are only contained in file1?
diff file1 file2 | grep "^<"|sed 's/^< //g'
for i in $(>temp ; done;
cat temp
In comparison, comm is much easier to remember. :-)
阅读(768) | 评论(0) | 转发(0) |