分类: LINUX
2015-11-20 11:46:04
awk多文本输入下,避免改变FS导致下一文件FS同时改变的split妙用
(1)需求:
替换文本内容,将缩写替换
(2)被替换文本:testfile8_sample
The USGCRP is a comprehensive.
The NASA program Mission to Planet Earth.
of the USGCRP and includes new initiatives
such as EOS and Earthprobes.
(3)替换内容存储文本:testfile8_acronyms
USGCRP U.S. Global Change Research Program
NASA National Aeronautic and Space Administration
EOS Earth Observing System
(4)脚本:
#!/bin/bash
awk -v wordfile=$1 -v searchfile=$2 '
# BEGIN {
# FS="\t" #改变了FS但是导致下一文件FS同样改变,无法执行
# }
# FILENAME == wordfile {
# word_e[$1] = $2
# next
# }
FILENAME == wordfile {
split ( $0 , entry , "\t" ) #不改变全局FS即能执行
word_e[entry[1]] = entry[2]
next #自动执行完第一个文件,以下脚本针对下一文件处理
}
/[A-Z][A-Z]+/ {
for ( i=1 ; i<=NF ; i++ ) {
if ( $i in word_e )
$i = word_e[$i] "( " $i " )"
}
print $0
}
' $* #第一个参数为替换内容的文本,第二个参数为被替换文本
(5)执行:
./awkscript7_expandword testfile8_acronyms testfile8_sample
结果:
The U.S. Global Change Research Program( USGCRP ) is a comprehensive.
The National Aeronautic and Space Administration( NASA ) program Mission to Planet Earth.
of the U.S. Global Change Research Program( USGCRP ) and includes new initiatives
such as Earth Observing System( EOS ) and Earthprobes.