linux下的awk command-jimonitu-ChinaUnix博客

linux administrator

首页　| 　博文目录　| 　关于我

jimonitu

博客访问： 239352
博文数量： 91
博客积分： 2010
博客等级：大尉
技术积分： 955
用户组：普通用户
注册时间： 2007-08-12 09:38

文章分类

全部博文（91）

c++（16）
linux（22）
java（1）
linux system com（45）
未分配的博文（7）

文章存档

2017年（1）

2011年（1）

2008年（15）

2007年（74）

我的朋友

相关博文

linux下的awk command

分类： LINUX

2007-08-19 10:23:53

version: redhat 9.0 awk syntax awk '{pattern + action}' files if pattern omited, awk will work with every line(如果pattern被省略，awk将对输入的每一行进行操作) where pattern represents what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly brackets ({}) are not always required around your program, but they are used to group a series of instructions based on a specific pattern. To illustrate, look at the following employee-list file saved as 1.txt: 46012 DULANEY EVAN MOBILE AL 46013 DURHAM JEFF MOBILE AL 46015 STEEN BILL MOBILE AL 46017 FELDMAN EVAN MOBILE AL 46018 SWIM STEVE UNKNOWN AL 46019 BOGUE ROBERT PHOENIX AR 46021 JUNE MICAH PHOENIX AR 46022 KANE SHERYL UNKNOWN AR 46024 WOOD WILLIAM MUNCIE IN 46026 FERGUS SARAH MUNCIE IN 46027 BUCK SARAH MUNCIE IN 46029 TUTTLE BOB MUNCIE IN example: print every line of 1.txt $awk '{print;}' 1.txt note： the single quote sign can't omit, if do, it will error(这单引号不能丢，丢了会报错) awk里最有用的功能是它自动把分成字段。一个字段是一个字符集合，被一个或多个字段分隔符分隔开来，默认的分隔符是空格或制表符。当一行被读入时，awk把它已经解析的字段放进第一字段的变量$1中，然后是第二个字段的变量$2中.example, the first variable is $1 example: print the argument1 and argument2 of 1.txt $awk '{print $1$2}' 1.txt note: the above consequence have not space, if want, it look like this: $awk '{print $1,$2}' 1.txt also can use printf replace the print command If you do not specify what fields to print, the entire matching entry will print: $ awk '/AL/' 1.txt 46012 DULANEY EVAN MOBILE AL 46013 DURHAM JEFF MOBILE AL 46015 STEEN BILL MOBILE AL 46017 FELDMAN EVAN MOBILE AL 46018 SWIM STEVE UNKNOWN AL $ Multiple commands for the same set of data can be separated with a semicolon (;). For example, to print names on one line and city and state on another, next, leave a blank line after each two-line display: 输出匹配AL的$1,$2,$3,$4字段 $ awk '/AL/ {print $1,$2; print $3,$4"\n"}' 1.txt 46012 NEY EVAN MOBILE 46013 DURHAM JEFF MOBILE 46015 STEEN BILL MOBILE 46017 FELDMAN EVAN MOBILE 46018 SWIM STEVE UNKNOWN $ here some special characters : * \n (new line) * \t (tab) * \b (backspace) * \f (formfeed) * \r (carriage return) If the semicolon(;) were not used (print $3,$2,$4,$5), all would appear on the same line. On the other hand, if the two print statements were given separately, an altogether(完全) different result would occur: $ awk '/AL/ {print $3,$2} {print $4,$5}' 1.txts EVAN DULANEY MOBILE AL JEFF DURHAM MOBILE AL BILL STEEN MOBILE AL EVAN FELDMAN MOBILE AL STEVE SWIM UNKNOWN AL PHOENIX AR PHOENIX AR UNKNOWN AR MUNCIE IN MUNCIE IN MUNCIE IN MUNCIE IN $ 将匹配AL的$2,$3字段输出，输出字段$4,$5的所有内容 you can use multiple patterns and actions $awk '/AL/ {print $1,$2,$5"\n"} /IN/ { print $3,$4,$5}' 1.txt 46012 NEY AL 46013 DURHAM AL 46015 STEEN AL 46017 FELDMAN AL 46018 SWIM AL WILLIAM MUNCIE IN SARAH MUNCIE IN SARAH MUNCIE IN BOB MUNCIE IN You can search for more than one pattern match at a time by placing the multiple criteria in consecutive order and separating them with a pipe (|) symbol: $ awk '/AL|IN/' emp_names 46012 DULANEY EVAN MOBILE AL 46013 DURHAM JEFF MOBILE AL 46015 STEEN BILL MOBILE AL 46017 FELDMAN EVAN MOBILE AL 46018 SWIM STEVE UNKNOWN AL 46024 WOOD WILLIAM MUNCIE IN 46026 FERGUS SARAH MUNCIE IN 46027 BUCK SARAH MUNCIE IN 46029 TUTTLE BOB MUNCIE IN $ you can use printf replace print $awk '{printf "%-15s %s\n",$1,$2;}' 1.txt 46012 NEY 46013 DURHAM 46015 STEEN 46017 FELDMAN 46018 SWIM 46019 BOGUE 46021 JUNE 46022 KANE 46024 WOOD 46026 FERGUS 46027 BUCK 46029 TUTTLE $ A problem occurs, however, when you try to find the people who live in Arizona: $ awk '/AR/' emp_names 46019 BOGUE ROBERT PHOENIX AR 46021 JUNE MICAH PHOENIX AR 46022 KANE SHERYL UNKNOWN AR 46026 FERGUS SARAH MUNCIE IN 46027 BUCK SARAH MUNCIE IN Employees 46026 and 46027 do not live in Arizona; however, their first names contain the character sequence being searched for. The important thing to remember is that when pattern matching in AWK, as in grep, sed, or most other Linux/Unix commands, look for a match anywhere in the record (line) unless told to do otherwise. To solve this problem, it is necessary to tie the search to a particular field. This goal is accomplished(实现) by means of a tilde (~) and a specification to a specific field, as the following example illustrates(实例): $ awk '$5 ˜; /AR/' emp_names 46019 BOGUE ROBERT PHOENIX AR 46021 JUNE MICAH PHOENIX AR 46022 KANE SHERYL UNKNOWN AR $ The opposite of the tilde (signifying a match) is a tilde preceded by an exclamation mark (!˜). These characters tell the program to find all lines matching the search sequence, providing the sequence does not appear in the specified field: $ awk '$5 !˜ /AR/' names 46012 DULANEY EVAN MOBILE AL 46013 DURHAM JEFF MOBILE AL 46015 STEEN BILL MOBILE AL 46017 FELDMAN EVAN MOBILE AL 46018 SWIM STEVE UNKNOWN AL 46024 WOOD WILLIAM MUNCIE IN 46026 FERGUS SARAH MUNCIE IN 46027 BUCK SARAH MUNCIE IN 46029 TUTTLE BOB MUNCIE IN $ In this case, it displayed all lines that do not have AR in the fifth field — including the two Sarah's entries that do have AR, but in the third field instead of the fifth one. 字段的输出顺序并不受字段输入顺序限制，如上例中，你也可以将$1和$2的顺序颠倒，不会出错 $cat 1.txt|tr ' ' ':'|tr -s ':'>2.txt $cat 2.txt>1.txt $cat 1.txt 16012:NEY:EVAN:MOBILE:AL 46013:DURHAM:JEFF:MOBILE:AL 46015:STEEN:BILL:MOBILE:AL 46017:FELDMAN:EVAN:MOBILE:AL 46018:SWIM:STEVE:UNKNOWN:AL 46019:BOGUE:ROBERT:PHOENIX:AR 46021:JUNE:MICAH:PHOENIX:AR 46022:KANE:SHERYL:UNKNOWN:AR 46024:WOOD:WILLIAM:MUNCIE:IN 46026:FERGUS:SARAH:MUNCIE:IN 46027:BUCK:SARAH:MUNCIE:IN 46029:TUTTLE:BOB:MUNCIE:IN $ awk '{print $2}' emp_names Braces and Field Separators you would end up with twelve blank lines. Because there are no spaces in the file, there are no discernible fields beyond the first one. To solve the problem, AWK must be told that a character other than white space is the delimiter, and there are two methods by which to inform AWK of the new field separator: Use the command-line parameter -F, or specify the variable FS within the program. Both strategies work equally well, with one exception, as illustrated by the following example: $ awk '{FS=":"}{print $1}' emp_names 46012:NEY:EVAN:MOBILE:AL 46013 46015 46017 46018 46019 46021 46022 46024 46026 46027 46029 $ 这FS有一点问题，它把第一行当作一整体了，所以最好用下面的例子。 $ awk -F: '{print $2}' emp_names 16012 46013 46015 46017 46018 46019 46021 46022 46024 46026 46027 46029 $ As I mentioned at the start of this article, the default display/output field separator is a blank space. This feature can be changed within the program by using the Output Field Separator (OFS) variable. For example, to read the file (separated by colons) and display it with dashes(连字符), the command would be $ awk -F":" '{OFS="-"}{print $1,$2,$3,$4,$5}' emp_names|head 46012-DULANEY-EVAN-MOBILE-AL 46013-DURHAM-JEFF-MOBILE-AL 46015-STEEN-BILL-MOBILE-AL 46017-FELDMAN-EVAN-MOBILE-AL 46018-SWIM-STEVE-UNKNOWN-AL 46019-BOGUE-ROBERT-PHOENIX-AZ 46021-JUNE-MICAH-PHOENIX-AZ 46022-KANE-SHERYL-UNKNOWN-AR 46024-WOOD-WILLIAM-MUNCIE-IN 46026-FERGUS-SARAH-MUNCIE-IN $ FS and OFS, (input) Field Separator and Output Field Separator, are but a couple of the variables that can be used within the AWK utility. For example, to number each line as it is printed, use the NR variable in the following manner: $ awk -F":" '{print NR,$1,$2,$3}' emp_names|head -n 5 1 46012 DULANEY EVAN 2 46013 DURHAM JEFF 3 46015 STEEN BILL 4 46017 FELDMAN EVAN 5 46018 SWIM STEVE $ To find all lines with employee numbers between 46012 and 46015: $ awk -F":" '/4601[2-5]/' emp_names 46012 DULANEY EVAN MOBILE AL 46013 DURHAM JEFF MOBILE AL 46015 STEEN BILL MOBILE AL $ to change the delimiter from spaces to dot(.), the command could be: $awk '{print $1"."$2"."$3"."$4"."$5}' 1.txt you can append to some other content, such as: $awk -F: '{print "first item",$1,$2"."}' 2.txt|head -3 first item 46012 NEY. first item 46013 DURHAM. first item 46015 STEEN. Math Operations In addition to the textual possibilities AWK provides, it also offers a full range of arithmetic operators, including the following: + adds numbers together - subtracts * multiplies / divides ^ performs exponential mathematics % gives the modulo ++ adds one to the value of a variable += assigns the result of an addition operation to a variable — subtracts one from a variable -= assigns the result of a subtraction operation to a variable *= assigns the result of multiplication /= assigns the result of division %= assigns the result of a modulo operation $ cat inventory For example, assume the following file(2.txt) exists on your machine detailing the inventory(存货清单) in a hardware store(五金店): hammers（铁锤） 5 7.99 drills 2 29.99 punches 7 3.59 drifts 2 4.09 bits 55 1.19 saws(锯子) 123 14.99 nails（钉子） 800 .19 screws（螺丝钉） 80 .29 brads（曲头钉） 100 .24 $ The first order of business is to compute the value of each item's inventory by multiplying(乘) the value of the second field (quantity(量，数量)) by the value of the third field (price): $awk '{print "name:",$1, "count",$2,"the price",$3 "the total",$2*$3}' 2.txt the naem:#!/bin/bashthe count the price the total 0 the naem:hammersthe count 5 the price 7.99 the total 39.95 the naem:drillsthe count 2 the price 29.99 the total 59.98 the naem:punchesthe count 7 the price 3.59 the total 25.13 the naem:driftsthe count 2 the price 4.09 the total 8.18 the naem:bitsthe count 55 the price 1.19 the total 65.45 the naem:sawsthe count 123 the price 14.99 the total 1843.77 the naem:nailsthe count 800 the price 0.19 the total 152 the naem:screwsthe count 80 the price 0.29 the total 23.2 the naem:bradsthe count 100 the price 0.24 the total 24 If the lines themselves are unimportant, and you want only to determine exactly how many items are in the store, you can assign a generic variable to increment by the number of items in each record: $ awk '{x=x+$2} {print x}' 2.txt 5 7 14 16 71 194 994 1074 1174 $ The same process can be applied to determining the total value of the inventory on hand: $ awk '{x=x+($2*$3)} {print x}' 2.txt $ awk '{x=x+($2*$3)}{print $1,"QTY: "$2,"PRICE: "$3,"TOTAL: "$2*$3,"BAL: "x}' 2.txt BEGIN and END Actions can be specified to take place prior to the actual start of processing or after it has been completed with BEGIN and END statements respectively. BEGIN statements are most commonly used to establish variables or display a header. END statements, on the other hand, can be used to continue processing after the program has finished. In an earlier example, a complete value of the inventory was generated with the routine awk '{x=x+($2*$3)} {print x}' inventory This routine provided a display for each line in the file as the running total accumulated(积聚，堆积). There was no other way to specify it, and not having it print at each line would have resulted in it never printing. With an END statement, however, this problem can be circumvented（绕过）: $ awk '{x=x+($2*$3)} END {print "Total Value of Inventory: "x}' 2.txt Total Value of Inventory: 2241.66 $ 记住：begin and end 必须是大写字母。 The variable x is defined, and it processes for each line; however, no display is generated until all processing has completed. While it's useful as a standalone routine, it an also be put with the earlier listing to add even more information and a more complete report: $ awk '{x=x+($2*$3)} {print $1,"QTY: "$2,"PRICE: "$3,"TOTAL: "$2*$3} END {print "Total Value of Inventory: " x}' 2.txt The BEGIN command words in the same fashion as END, but it establishes items that need to be done before anything else is accomplished. The most common purpose of this procedure is to create headers for reports. The syntax for this routine would resemble $awk 'BEGIN {print "ITEM QUANTITY PRICE TOTAL"}{x=x+$2*$3} {print $1"\t",$2"\t",$3"\t",$2*$3}END {print "Total Value of Inventory: " x}' 2.txt ITEM QUANTITY PRICE TOTAL #!/bin/bash 0 hammers 5 7.99 39.95 drills 2 29.99 59.98 punches 7 3.59 25.13 drifts 2 4.09 8.18 bits 55 1.19 65.45 saws 123 14.99 1843.77 nails 800 0.19 152 screws 80 0.29 23.2 brads 100 0.24 24 Total Value of Inventory: 2241.66 Input, Output, and Source Files The AWK tool can read its input from a file, as was done in all examples up to this point, or it can take input from the output of another command. For example: $ sort emp_names | awk '{print $3,$2}' The input of the awk command is the output from the sort operation. In addition to sort, any other Linux command can be used — for example, grep. This procedure allows you to perform other operations on the file before pulling out selected fields. Like the shell, AWK uses the output-redirection operators > and >> to put its output into a file rather than to standard output. The symbols react like their counterparts in the shell, so > creates the file if it doesn't exist, and >> appends to the existing file. Examine the following example: $ awk '{print NR, $1 > "/tmp/filez" }' emp_names $ cat /tmp/filez 1 46012 2 46013 3 46015 4 46017 5 46018 6 46019 7 46021 8 46022 9 46024 10 46026 11 46027 12 46029 $ note:这重定向的文件必须用双引号引起来。 Examining the syntax of the statement, you can see that the output redirection is done after the print statement is complete. You must enclose the file name in quotes, or else it is simply an uninitialized AWK variable, and the combination of instructions generates an error from AWK. (If you use the redirection symbols improperly（错误的，不适合的）, AWK gets confused about whether the symbol means "redirection" or is a relation operator.) Output into pipes in AWK also resembles（相似的，类似） the way the same action would be accomplished in a shell. To send the output of a print command into a pipe, follow the print command with a pipe symbol and the name of the command, as in the following: $ awk '{ print $2 | "sort -n" }' 2.txt 2 2 5 7 55 80 100 123 800 $ As was the case with output redirection, you must enclose the command in quotes, and the name of the pipe is the name of the command being executed.(这管道命令也需用双引号括起来) you can use the printf command, such as: awk '{printf "%s %s\n",$1,$2}

阅读(1089) | 评论(0) | 转发(0) |

上一篇：linux下的sed command

下一篇：linux下的id command

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6