二. Conditionals, Loops, and Arrays
Awk的循环和条件判断都是从C语言中borrow过来的。跟C语言一样的语法。
1. 条件判断
if ( expression )
action1
[else
action2]
当expression不为空或者为真(非零)的时候 执行action1, 否则执行action2.
eg: if ( x ) print x
if ( x == y ) print x
if ( x ~ /[yY](es)?/ ) print x
if ( avg >= 65 )
grade = "Pass"
else
grade = "Fail"
if (avg >= 90) grade = "A"
else if (avg >= 80) grade = "B"
else if (avg >= 70) grade = "C"
else if (avg >= 60) grade = "D"
else grade = "F"
条件运算符:
expr ? action1 : action2
eg: grade = (avg >= 65) ? "Pass" : "Fail"
2. 循环
1) while,
实例如下:
i = 1
while ( i <= 4 ) {
print $i
++i
}
2)Do 循环
BEGIN {
do {
++x
print x
} while ( x <= 4 )
}
3) For循环
total = 0
for (i = 2; i <= NF; ++i)
total += $i
avg = total / (NF - 1)
3. 其他影响循环的关键字
1) break, continue
这两个关键字都和c语言中一样。
2)next
next语句会强制AWK停止目前正在处理的记录,并开始读入下一条记录。这意味着next后面的action都不会对当前的record执行。
例如: NF != 4 {
printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr"
next
}
3) exit
exit语言会强制awk停止目前正在处理的记录,并停止读入输入行。任何剩余的行都将被忽略。 如果exit有参数,则该参数将作为Awk的退出码。
例如: awk '{
...
exit 5 ##状态退出码为5
}
4. 数组
1) 定义形式: array[index] = value
eg: flavor[1] = "cherry"
eg: flavor_count = 5
for (x = 1; x <= flavor_count; ++x)
print flavor[x]
2) 关联数组(associative arrays)
关联数组的特点是其数组的index可以是数字或者字符串
eg: array[$1] = $2
acro["BASIC"]
对于数组的特殊循环语法: for ( variable in array )
do something with array[variable]
eg: for ( item in acro )
print item, acro[item]
3)测试数组中的成员
语法: item in array ##如果array[item]存在返回1,否则返回0
5, A Glossary Lookup Script
1) 实例: 程序文件lookup的内容如下:
awk '# lookup -- reads local glossary file and prompts user for query
#0
BEGIN { FS = "\t"; OFS = "\t"
# prompt user
printf("Enter a glossary term: ")
}
#1 read local file named glossary
FILENAME == "glossary" {
# load each glossary entry into an array
entry[$1] = $2
next
}
#2 scan for command to exit program
$0 ~ /^(quit|[qQ]|exit|[Xx])$/ { exit }
#3 process any non-empty line
$0 != "" {
if ( $0 in entry ) {
# it is there, print definition
print entry[$0]
} else
print $0 " not found"
}
#4 prompt user again for another term
{
printf("Enter another glossary term (q to quit): ")
}' glossary -
解释:首先awk在读入任何输入之前执行BEGIN中的语句,打印输入提示。然后开始从第一文件gloassary中读入各行。此时的FILENAME为glossary, 此时因为规则1中的next, 后面的几条规则都不会执行。在读入完毕后,则开始从第二个文件‘-’(既标准输入)中读入,此时的FILENAME=-。 然后开始从第2条规则开始验证了。注意,如果在标准输入,你只是仅仅敲入Enter,则awk会循环等待。直到遇到文件结束符或着退出。
测试:
$cat glossary
BASIC Beginner's All-Purpose Symbolic Instruction Code
CICS Customer Information Control System
COBOL Common Business Oriented Language
DBMS Data Base Management System
GIGO Garbage In, Garbage Out
GIRL Generalized Information Retrieval Language
$ ./lookup
Enter a glossary term: GIGO
Garbage in, garbage out
Enter another glossary term (q to quit): BASIC
Beginner's All-Purpose Symbolic Instruction Code
Enter another glossary term (q to quit): q
2) 用split()创建数组
语法: n = split(string, array, separator)
这里split()是awk的内建函数, string是要分析的字符串。 array是将string分裂后的各个小部分存放的数组,separator是分割符,n是array的长度。
eg: z = split($1, fullname, " ")
这样: fullname[1]=firstname, fullname[z]=lastname
eg: z = split($1, array, " ")
for (i = 1; i <= z; ++i)
print i, array[i]
3) 从数组删除元素
delete array[index]
这样对于index的测试会返回false
4) Making Conversions
一个将数字转变为罗马数字的脚本
脚本文件: $cat romanum
echo $1 |
awk '# romanum -- convert number 1-10 to roman numeral
# define numerals as list of roman numerals 1-10
BEGIN {
# create array named numerals from list of roman numerals
split("I,II,III,IV,V,VI,VII,VIII,IX,X", numerals, ",")
}
# look for number between 1 and 10
$1 > 0 && $1 <= 10 {
# print specified element
print numerals[$1]
exit
}
{ print "invalid number"
exit
}'
运行: $ romanum 4
IV
一个转换日期格式的脚本
awk '
# date-month -- convert mm/dd/yy or mm-dd-yy to month day, year
# build list of months and put in array.
BEGIN {
# the 3-step assignment is done for printing in book
listmonths = "January,February,March,April,May,June,"
listmonths = listmonths "July,August,September,"
listmonths = listmonths "October,November,December"
split(listmonths, month, ",")
}
# check that there is input
$1 != "" {
# split on "/" the first input field into elements of array
sizeOfArray = split($1, date, "/")
# check that only one field is returned
if (sizeOfArray == 1)
# try to split on "-"
sizeOfArray = split($1, date, "-")
# must be invalid
if (sizeOfArray == 1)
exit
# add 0 to number of month to coerce numeric type
date[1] += 0
# print month day, year
print month[date[1]], (date[2] ", 19" date[3])
}'
运行:$ echo "5/11/55" | date-month
May 11, 1955
三,函数
1.build-in function
1)Arithmetic Functions
cos(x) Returns cosine of x (x is in radians).
exp(x) Returns e to the power x.
int(x) Returns truncated value of x.
log(x) Returns natural logarithm (base-e) of x.
sin(x) Returns sine of x (x is in radians).
sqrt(x) Returns square root of x.
atan2(y,x) Returns arctangent of y/x in the range -[pi] to [pi].
rand() Returns pseudo-random number r, where 0 <= r < 1.
srand(x)
2)Establishes new seed for rand(). If no seed is specified, uses time of day. Returns the old seed.
3)强制类型转换的函数: int()
eg: print (100/3) ##33.3333
print int(100/3) ##33
4)Random Number Generation
The rand() function generates a pseudo-random floating-point number between 0 and 1. The srand() function sets the seed or starting point for random number generation. If srand() is called without an argument, it uses the time of day to generate the seed. With an argument x, srand() uses x as the seed.
2.string function
gsub(r,s,t) Globally substitutes s for each match of the regular expression r in the string t. Returns the number of substitutions. If t is not supplied, defaults to $0.
index(s,t) Returns position of substring t in string s or zero if not present.
length(s) Returns length of string s or length of $0 if no string is supplied.
match(s,r) Returns either the position in s where the regular expression r begins, or 0 if no occurrences are found. Sets the values of RSTART and RLENGTH.
split(s,a,sep) Parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting.
sprintf("fmt",expr) Uses printf format specification for expr.
sub(r,s,t) Substitutes s for first match of the regular expression r in the string t. Returns 1 if successful; 0 otherwise. If t is not supplied, defaults to $0.
substr(s,p,n) Returns substring of string s at beginning position p up to a maximum length of n. If n is not supplied, the rest of the string from p is used.
tolower(s) Translates all uppercase characters in string s to lowercase and returns the new string.
toupper(s) Translates all lowercase characters in string s to uppercase and returns the new string.
The split() function was introduced in the previous chapter in the discussion on arrays.
3. write your own function
1)格式:function name (parameter-list) {
statements
}
实例:
function insert(STRING, POS, INS) {
before_tmp = substr(STRING, 1, POS)
after_tmp = substr(STRING, POS + 1)
return before_tmp INS after_tmp
}
2)Maintaining a Function Library
可以将有用的函数放到一个特定的目录, 当作函数库用。通过-f选项可以使用多个程序文件,例如:
$ awk -f grade.awk -f /usr/local/share/awk/sort.awk grades.test
四. 高级主题
1.getline() 函数
getline()用于读入另一行输入,不仅仅是从输入流中读入,还可以从文件或者管道中读入。
getline的返回值:
1 If it was able to read a line.
0 If it encounters the end-of-file.
-1 If it encounters an error.
1)实例:
# getline.awk -- test getline function
/^\.SH "?Name"?/ {
getline # get next line
print $1 # print $1 of new line.
}
2)从文件中读入,
实例:getline < "data" 从data文件中读入一行. 一般可以如下写:
while ( (getline < "data") > 0 )
print
3)从标准输入读
BEGIN { printf "Enter your name: "
getline < "-"
print
}
4)直接将读到的值得赋给变量:
getline input ##直接将读到的值得赋给变量input
5)从管道读入:
实例1:awk '{"who am i" | getline me;print me}' -
实例2:
awk '# getname - print users fullname from /etc/passwd
BEGIN { "who am i" | getline
name = $1
FS = ":"
}
name ~ $1 { print $5 }
' /etc/passwd
2。close()函数
close()用于将打开的文件或者管道关闭
#一个实例
{ some processing of $0 | "sort > tmpfile" }
END {
close("sort > tmpfile")
while ((getline < "tmpfile") > 0) {
do more work
}
}
#这时候close()是必须的。
3。system()函数
跟c语言中的system()一样。实例:
# getFilename function -- prompts user for filename,
# verifies that file exists and returns absolute pathname.
function getFilename(file) {
while (! file) {
printf "Enter a filename: "
getline < "-" # get response
file = $0
# check that file exists and is readable
# test returns 1 if file does not exist.
if (system("test -r " file)) {
print file " not found"
file = ""
}
}
if (file !~ /^\//) {
"pwd" | getline # get current directory
close("pwd")
file = $0 "/" file
}
return file
}
4. 定向标准输出到文件或者管道
1)定向至文件
例如:print > "data.out" 将当前记录写入文件data.out
print "a =", a, "b =", b, "max =", (a > b ? a : b) > "data.out"
2)定向至管道:
print|command
eg: print |“wc -w” ##统计当前记录的单词个数
5. awk的常用限制
Item Limit
Number of fields per record 100
Characters per input record 3000
Characters per output record 3000
Characters per field 1024
Characters per printf string 3000
Characters in literal string 400
Characters in character class 400
Files open 15
Pipes open 1
但是大多数的awk都允许打开多于一个管道。
6.用#!调用Awk
$cat myscript.awk
#!/usr/bin/awk -f ##或者是其他/bin/awk -f
{ print $0 }
$./myscipt.awk test ##等价于awk -f myscript.awk test
阅读(1053) | 评论(0) | 转发(0) |