version: redhat 9.0
awk
syntax
awk '{pattern + action}' files
if pattern omited, awk will work with every line(如果pattern被省略,awk将对输入的每一行进行操作)
where pattern represents what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly brackets ({}) are not always required around your program, but they are used to group a series of instructions based on a specific pattern.
To illustrate, look at the following employee-list file saved as 1.txt:
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
example:
print every line of 1.txt
$awk '{print;}' 1.txt
note: the single quote sign can't omit, if do, it will error(这单引号不能丢,丢了会报错)
awk里最有用的功能是它自动把分成字段。一个字段是一个字符集合,被一个或多个字段分隔符分隔开来,默认的分隔符是空格或制表符。当一行被读入时,awk把它已经解析的字段放进第一字段的变量$1中,然后是第二个字段的变量$2中.example,
the first variable is $1
example:
print the argument1 and argument2 of 1.txt
$awk '{print $1$2}' 1.txt
note: the above consequence have not space, if want, it look like this:
$awk '{print $1,$2}' 1.txt
also can use printf replace the print command
If you do not specify what fields to print, the entire matching entry will print:
$ awk '/AL/' 1.txt
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
$
Multiple commands for the same set of data can be separated with a
semicolon (;). For example, to print names on one line and city and
state on another, next, leave a blank line after each two-line display:
输出匹配AL的$1,$2,$3,$4字段
$ awk '/AL/ {print $1,$2; print $3,$4"\n"}' 1.txt
46012 NEY
EVAN MOBILE
46013 DURHAM
JEFF MOBILE
46015 STEEN
BILL MOBILE
46017 FELDMAN
EVAN MOBILE
46018 SWIM
STEVE UNKNOWN
$
here some special characters :
* \n (new line)
* \t (tab)
* \b (backspace)
* \f (formfeed)
* \r (carriage return)
If the semicolon(;) were not used (print $3,$2,$4,$5), all would
appear on the same line. On the other hand, if the two print statements
were given separately, an altogether(完全) different result would occur:
$ awk '/AL/ {print $3,$2} {print $4,$5}' 1.txts
EVAN DULANEY
MOBILE AL
JEFF DURHAM
MOBILE AL
BILL STEEN
MOBILE AL
EVAN FELDMAN
MOBILE AL
STEVE SWIM
UNKNOWN AL
PHOENIX AR
PHOENIX AR
UNKNOWN AR
MUNCIE IN
MUNCIE IN
MUNCIE IN
MUNCIE IN
$
将匹配AL的$2,$3字段输出,输出字段$4,$5的所有内容
you can use multiple patterns and actions
$awk '/AL/ {print $1,$2,$5"\n"} /IN/ { print $3,$4,$5}' 1.txt
46012 NEY AL
46013 DURHAM AL
46015 STEEN AL
46017 FELDMAN AL
46018 SWIM AL
WILLIAM MUNCIE IN
SARAH MUNCIE IN
SARAH MUNCIE IN
BOB MUNCIE IN
You can search for more than one pattern match at a time by placing
the multiple criteria in consecutive order and separating them with a
pipe (|) symbol:
$ awk '/AL|IN/' emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
you can use printf replace print
$awk '{printf "%-15s %s\n",$1,$2;}' 1.txt
46012 NEY
46013 DURHAM
46015 STEEN
46017 FELDMAN
46018 SWIM
46019 BOGUE
46021 JUNE
46022 KANE
46024 WOOD
46026 FERGUS
46027 BUCK
46029 TUTTLE
$
A problem occurs, however, when you try to find the people who live in Arizona:
$ awk '/AR/' emp_names
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
Employees 46026 and 46027 do not live in Arizona; however, their first
names contain the character sequence being searched for. The important
thing to remember is that when pattern matching in AWK, as in grep,
sed, or most other Linux/Unix commands, look for a match anywhere in
the record (line) unless told to do otherwise. To solve this problem,
it is necessary to tie the search to a particular field. This goal is
accomplished(实现) by means of a tilde (~) and a specification to a
specific field, as the following example illustrates(实例):
$ awk '$5 ˜; /AR/' emp_names
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
$
The opposite of the tilde (signifying a match) is a tilde preceded by an
exclamation mark (!˜). These characters tell the program to
find all lines matching the search sequence, providing the sequence
does not appear in the specified field:
$ awk '$5 !˜ /AR/' names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
In this case, it displayed all lines that do not have AR in the fifth field — including the two Sarah's entries that do have AR, but in the third field instead of the fifth one.
字段的输出顺序并不受字段输入顺序限制,如上例中,你也可以将$1和$2的顺序颠倒,不会出错
$cat 1.txt|tr ' ' ':'|tr -s ':'>2.txt
$cat 2.txt>1.txt
$cat 1.txt
16012:NEY:EVAN:MOBILE:AL
46013:DURHAM:JEFF:MOBILE:AL
46015:STEEN:BILL:MOBILE:AL
46017:FELDMAN:EVAN:MOBILE:AL
46018:SWIM:STEVE:UNKNOWN:AL
46019:BOGUE:ROBERT:PHOENIX:AR
46021:JUNE:MICAH:PHOENIX:AR
46022:KANE:SHERYL:UNKNOWN:AR
46024:WOOD:WILLIAM:MUNCIE:IN
46026:FERGUS:SARAH:MUNCIE:IN
46027:BUCK:SARAH:MUNCIE:IN
46029:TUTTLE:BOB:MUNCIE:IN
$ awk '{print $2}' emp_names
Braces and Field Separators
you would end up with twelve blank lines. Because there are no spaces in the file, there are no discernible fields beyond the first one. To solve the problem, AWK must be told that a character other than white space is the delimiter, and there are two methods by which to inform AWK of the new field separator: Use the command-line parameter -F, or specify the variable FS within the program. Both strategies work equally well, with one exception, as illustrated by the following example:
$ awk '{FS=":"}{print $1}' emp_names
46012:NEY:EVAN:MOBILE:AL
46013
46015
46017
46018
46019
46021
46022
46024
46026
46027
46029
$
这FS有一点问题,它把第一行当作一整体了,所以最好用下面的例子。
$ awk -F: '{print $2}' emp_names
16012
46013
46015
46017
46018
46019
46021
46022
46024
46026
46027
46029
$
As I mentioned at the start of this article, the default display/output field separator is a blank space. This feature can be changed within the program by using the Output Field Separator (OFS) variable. For example, to read the file (separated by colons) and display it with dashes(连字符), the command would be
$ awk -F":" '{OFS="-"}{print $1,$2,$3,$4,$5}' emp_names|head
46012-DULANEY-EVAN-MOBILE-AL
46013-DURHAM-JEFF-MOBILE-AL
46015-STEEN-BILL-MOBILE-AL
46017-FELDMAN-EVAN-MOBILE-AL
46018-SWIM-STEVE-UNKNOWN-AL
46019-BOGUE-ROBERT-PHOENIX-AZ
46021-JUNE-MICAH-PHOENIX-AZ
46022-KANE-SHERYL-UNKNOWN-AR
46024-WOOD-WILLIAM-MUNCIE-IN
46026-FERGUS-SARAH-MUNCIE-IN
$
FS and OFS, (input) Field Separator and Output Field Separator, are but a couple of the variables that can be used within the AWK utility. For example, to number each line as it is printed, use the NR variable in the following manner:
$ awk -F":" '{print NR,$1,$2,$3}' emp_names|head -n 5
1 46012 DULANEY EVAN
2 46013 DURHAM JEFF
3 46015 STEEN BILL
4 46017 FELDMAN EVAN
5 46018 SWIM STEVE
$
To find all lines with employee numbers between 46012 and 46015:
$ awk -F":" '/4601[2-5]/' emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
$
to change the delimiter from spaces to dot(.), the command could be:
$awk '{print $1"."$2"."$3"."$4"."$5}' 1.txt
you can append to some other content, such as:
$awk -F: '{print "first item",$1,$2"."}' 2.txt|head -3
first item 46012 NEY.
first item 46013 DURHAM.
first item 46015 STEEN.
Math Operations
In addition to the textual possibilities AWK provides, it also offers a full range of arithmetic operators, including the following:
+ adds numbers together
- subtracts
* multiplies
/ divides
^ performs exponential mathematics
% gives the modulo
++ adds one to the value of a variable
+= assigns the result of an addition operation to a variable
— subtracts one from a variable
-= assigns the result of a subtraction operation to a variable
*= assigns the result of multiplication
/= assigns the result of division
%= assigns the result of a modulo operation
$ cat inventory
For example, assume the following file(2.txt) exists on your machine detailing the inventory(存货清单) in a hardware store(五金店):
hammers(铁锤) 5 7.99
drills 2 29.99
punches 7 3.59
drifts 2 4.09
bits 55 1.19
saws(锯子) 123 14.99
nails(钉子) 800 .19
screws(螺丝钉) 80 .29
brads(曲头钉) 100 .24
$
The first order of business is to compute the value of each item's inventory by multiplying(乘) the value of the second field (quantity(量,数量)) by the value of the third field (price):
$awk '{print "name:",$1, "count",$2,"the price",$3 "the total",$2*$3}' 2.txt
the naem:#!/bin/bashthe count the price the total 0
the naem:hammersthe count 5 the price 7.99 the total 39.95
the naem:drillsthe count 2 the price 29.99 the total 59.98
the naem:punchesthe count 7 the price 3.59 the total 25.13
the naem:driftsthe count 2 the price 4.09 the total 8.18
the naem:bitsthe count 55 the price 1.19 the total 65.45
the naem:sawsthe count 123 the price 14.99 the total 1843.77
the naem:nailsthe count 800 the price 0.19 the total 152
the naem:screwsthe count 80 the price 0.29 the total 23.2
the naem:bradsthe count 100 the price 0.24 the total 24
If the lines themselves are unimportant, and you want only to
determine exactly how many items are in the store, you can assign a
generic variable to increment by the number of items in each record:
$ awk '{x=x+$2} {print x}' 2.txt
5
7
14
16
71
194
994
1074
1174
$
The same process can be applied to determining the total value of the inventory on hand:
$ awk '{x=x+($2*$3)} {print x}' 2.txt
$ awk '{x=x+($2*$3)}{print $1,"QTY: "$2,"PRICE: "$3,"TOTAL: "$2*$3,"BAL: "x}' 2.txt
BEGIN and END
Actions can be specified to take place prior to the actual start of
processing or after it has been completed with BEGIN and END statements
respectively. BEGIN statements are most commonly used to establish
variables or display a header. END statements, on the other hand, can
be used to continue processing after the program has finished.
In an earlier example, a complete value of the inventory was generated with the routine
awk '{x=x+($2*$3)} {print x}' inventory
This routine provided a display for each line in the file as the
running total accumulated(积聚,堆积). There was no other way to specify it,
and not having it print at each line would have resulted in it never
printing. With an END statement, however, this problem can be
circumvented(绕过):
$ awk '{x=x+($2*$3)} END {print "Total Value of Inventory: "x}' 2.txt
Total Value of Inventory: 2241.66
$
记住:begin and end 必须是大写字母。
The variable x is defined, and it processes for each line; however, no
display is generated until all processing has completed. While it's useful as a standalone routine, it an also be put with the earlier listing to add even more information and a more complete report:
$ awk '{x=x+($2*$3)} {print $1,"QTY: "$2,"PRICE:
"$3,"TOTAL: "$2*$3} END {print "Total Value of Inventory: " x}' 2.txt
The BEGIN command words in the same fashion as END, but it establishes items that need to be done before anything else is accomplished. The most common purpose of this procedure is to create headers for reports. The syntax for this routine would resemble
$awk 'BEGIN {print "ITEM
QUANTITY PRICE TOTAL"}{x=x+$2*$3} {print
$1"\t",$2"\t",$3"\t",$2*$3}END {print "Total Value of Inventory: " x}' 2.txt
ITEM QUANTITY PRICE TOTAL
#!/bin/bash 0
hammers 5 7.99 39.95
drills 2 29.99 59.98
punches 7 3.59 25.13
drifts 2 4.09 8.18
bits 55 1.19 65.45
saws 123 14.99 1843.77
nails 800 0.19 152
screws 80 0.29 23.2
brads 100 0.24 24
Total Value of Inventory: 2241.66
Input, Output, and Source Files
The AWK tool can read its input from a file, as was done in all examples up to this point, or it can take input from the output of another command. For example:
$ sort emp_names | awk '{print $3,$2}'
The input of the awk command is the output from the sort operation. In addition to sort, any other Linux command can be used — for example, grep. This procedure allows you to perform other operations on the file before pulling out selected fields.
Like the shell, AWK uses the output-redirection operators > and >> to put its output into a file rather than to standard output. The symbols react like their counterparts in the shell, so > creates the file if it doesn't exist, and >> appends to the existing file. Examine the following example:
$ awk '{print NR, $1 > "/tmp/filez" }' emp_names
$ cat /tmp/filez
1 46012
2 46013
3 46015
4 46017
5 46018
6 46019
7 46021
8 46022
9 46024
10 46026
11 46027
12 46029
$
note:这重定向的文件必须用双引号引起来。
Examining the syntax of the statement, you can see that the output
redirection is done after the print statement is complete. You must
enclose the file name in quotes, or else it is simply an uninitialized
AWK variable, and the combination of instructions generates an error
from AWK. (If you use the redirection symbols improperly(错误的,不适合的), AWK
gets confused about whether the symbol means "redirection" or is a
relation operator.)
Output into pipes in AWK also resembles(相似的,类似) the way the same
action would be accomplished in a shell. To send the output of a print
command into a pipe, follow the print command with a pipe symbol and
the name of the command, as in the following:
$ awk '{ print $2 | "sort -n" }' 2.txt
2
2
5
7
55
80
100
123
800
$
As was the case with output redirection, you must enclose the command in
quotes, and the name of the pipe is the name of the command being
executed.(这管道命令也需用双引号括起来)
you can use the printf command, such as:
awk '{printf "%s %s\n",$1,$2}
|