Chinaunix首页 | 论坛 | 博客
  • 博客访问: 239352
  • 博文数量: 91
  • 博客积分: 2010
  • 博客等级: 大尉
  • 技术积分: 955
  • 用 户 组: 普通用户
  • 注册时间: 2007-08-12 09:38
文章分类

全部博文(91)

文章存档

2017年(1)

2011年(1)

2008年(15)

2007年(74)

我的朋友

分类: LINUX

2007-08-19 10:23:53

version: redhat 9.0
awk
 syntax
 awk '{pattern + action}' files
 if pattern omited, awk will work with every line(如果pattern被省略,awk将对输入的每一行进行操作)
 where pattern represents what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly brackets ({}) are not always required around your program, but they are used to group a series of instructions based on a specific pattern.

 To illustrate, look at the following employee-list file saved as 1.txt:

46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN

 example:
 print every line of 1.txt
 $awk '{print;}' 1.txt
 note: the single quote sign can't omit, if do, it will error(这单引号不能丢,丢了会报错)
 awk里最有用的功能是它自动把分成字段。一个字段是一个字符集合,被一个或多个字段分隔符分隔开来,默认的分隔符是空格或制表符。当一行被读入时,awk把它已经解析的字段放进第一字段的变量$1中,然后是第二个字段的变量$2中.example, the first variable is $1
 example:
 print the argument1 and argument2 of 1.txt
 $awk '
{print $1$2}' 1.txt
 note: the above consequence have not space, if want, it look like this:
 $awk '
{print $1,$2}' 1.txt
 also can use printf replace the print command
 If you do not specify what fields to print, the entire matching entry will print:
$ awk '
/AL/' 1.txt
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
$
Multiple commands for the same set of data can be separated with a semicolon (;). For example, to print names on one line and city and state on another, next, leave a blank line after each two-line display:
输出匹配AL的$1,$2,$3,$4字段
$ awk '
/AL/ {print $1,$2; print $3,$4"\n"}' 1.txt
46012 NEY
EVAN MOBILE

46013 DURHAM
JEFF MOBILE

46015 STEEN
BILL MOBILE

46017 FELDMAN
EVAN MOBILE

46018 SWIM
STEVE UNKNOWN
$
here some special characters :
    * \n (new line)
    * \t (tab)
    * \b (backspace)
    * \f (formfeed)
    * \r (carriage return)

If the semicolon(;) were not used (print $3,$2,$4,$5), all would appear on the same line. On the other hand, if the two print statements were given separately, an altogether(完全) different result would occur:
$ awk '
/AL/ {print $3,$2} {print $4,$5}' 1.txts
EVAN DULANEY
MOBILE AL
JEFF DURHAM
MOBILE AL
BILL STEEN
MOBILE AL
EVAN FELDMAN
MOBILE AL
STEVE SWIM
UNKNOWN AL
PHOENIX AR
PHOENIX AR
UNKNOWN AR
MUNCIE IN
MUNCIE IN
MUNCIE IN
MUNCIE IN
$
将匹配AL的$2,$3字段输出,输出字段$4,$5的所有内容
you can use multiple patterns and actions
$awk '
/AL/ {print $1,$2,$5"\n"} /IN/ { print $3,$4,$5}' 1.txt
46012 NEY AL
 
46013 DURHAM AL
 
46015 STEEN AL
 
46017 FELDMAN AL
 
46018 SWIM AL
 
WILLIAM MUNCIE IN
SARAH MUNCIE IN
SARAH MUNCIE IN
BOB MUNCIE IN

You can search for more than one pattern match at a time by placing the multiple criteria in consecutive order and separating them with a pipe (|) symbol:

$ awk '
/AL|IN/' emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
you can use printf replace print
 $awk '
{printf "%-15s %s\n",$1,$2;}' 1.txt
46012 NEY
46013 DURHAM
46015 STEEN
46017 FELDMAN
46018 SWIM
46019 BOGUE
46021 JUNE
46022 KANE
46024 WOOD
46026 FERGUS
46027 BUCK
46029 TUTTLE
$

A problem occurs, however, when you try to find the people who live in Arizona:
$ awk '
/AR/' emp_names
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
Employees 46026 and 46027 do not live in Arizona; however, their first names contain the character sequence being searched for. The important thing to remember is that when pattern matching in AWK, as in grep, sed, or most other Linux/Unix commands, look for a match anywhere in the record (line) unless told to do otherwise. To solve this problem, it is necessary to tie the search to a particular field. This goal is accomplished(实现) by means of a tilde (~) and a specification to a specific field, as the following example illustrates(实例):

$ awk '
$5 ˜; /AR/' emp_names
46019 BOGUE ROBERT PHOENIX AR
46021 JUNE MICAH PHOENIX AR
46022 KANE SHERYL UNKNOWN AR
$
The opposite of the tilde (signifying a match) is a tilde preceded by an exclamation mark (!˜). These characters tell the program to find all lines matching the search sequence, providing the sequence does not appear in the specified field:

$ awk '
$5 !˜ /AR/' names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
In this case, it displayed all lines that do not have AR in the fifth field — including the two Sarah'
s entries that do have AR, but in the third field instead of the fifth one.

 字段的输出顺序并不受字段输入顺序限制,如上例中,你也可以将$1和$2的顺序颠倒,不会出错
 $cat 1.txt|tr ' ' ':'|tr -s ':'>2.txt
 $cat 2.txt>1.txt
 $cat 1.txt
16012:NEY:EVAN:MOBILE:AL
46013:DURHAM:JEFF:MOBILE:AL
46015:STEEN:BILL:MOBILE:AL
46017:FELDMAN:EVAN:MOBILE:AL
46018:SWIM:STEVE:UNKNOWN:AL
46019:BOGUE:ROBERT:PHOENIX:AR
46021:JUNE:MICAH:PHOENIX:AR
46022:KANE:SHERYL:UNKNOWN:AR
46024:WOOD:WILLIAM:MUNCIE:IN
46026:FERGUS:SARAH:MUNCIE:IN
46027:BUCK:SARAH:MUNCIE:IN
46029:TUTTLE:BOB:MUNCIE:IN
$ awk '{print $2}' emp_names
             Braces and Field Separators
you would end up with twelve blank lines. Because there are no spaces in the file, there are no discernible fields beyond the first one. To solve the problem, AWK must be told that a character other than white space is the delimiter, and there are two methods by which to inform AWK of the new field separator: Use the command-line parameter -F, or specify the variable FS within the program. Both strategies work equally well, with one exception, as illustrated by the following example:

$ awk '{FS=":"}{print $1}' emp_names
46012:NEY:EVAN:MOBILE:AL
46013
46015
46017
46018
46019
46021
46022
46024
46026
46027
46029
$
这FS有一点问题,它把第一行当作一整体了,所以最好用下面的例子。
$ awk -F: '{print $2}' emp_names
16012
46013
46015
46017
46018
46019
46021
46022
46024
46026
46027
46029
$
As I mentioned at the start of this article, the default display/output field separator is a blank space. This feature can be changed within the program by using the Output Field Separator (OFS) variable. For example, to read the file (separated by colons) and display it with dashes(连字符), the command would be

$ awk -F":" '{OFS="-"}{print $1,$2,$3,$4,$5}' emp_names|head
46012-DULANEY-EVAN-MOBILE-AL
46013-DURHAM-JEFF-MOBILE-AL
46015-STEEN-BILL-MOBILE-AL
46017-FELDMAN-EVAN-MOBILE-AL
46018-SWIM-STEVE-UNKNOWN-AL
46019-BOGUE-ROBERT-PHOENIX-AZ
46021-JUNE-MICAH-PHOENIX-AZ
46022-KANE-SHERYL-UNKNOWN-AR
46024-WOOD-WILLIAM-MUNCIE-IN
46026-FERGUS-SARAH-MUNCIE-IN
$
FS and OFS, (input) Field Separator and Output Field Separator, are but a couple of the variables that can be used within the AWK utility. For example, to number each line as it is printed, use the NR variable in the following manner:

$ awk -F":" '{print NR,$1,$2,$3}' emp_names|head -n 5
1 46012 DULANEY EVAN
2 46013 DURHAM JEFF
3 46015 STEEN BILL
4 46017 FELDMAN EVAN
5 46018 SWIM STEVE
$
To find all lines with employee numbers between 46012 and 46015:

$ awk -F":" '/4601[2-5]/' emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
$
to change the delimiter from spaces to dot(.), the command could be:

$awk '{print $1"."$2"."$3"."$4"."$5}' 1.txt
you can append to some other content, such as:
$awk -F: '{print "first item",$1,$2"."}' 2.txt|head -3
first item 46012 NEY.
first item 46013 DURHAM.
first item 46015 STEEN.

       Math Operations

In addition to the textual possibilities AWK provides, it also offers a full range of arithmetic operators, including the following:

+ adds numbers together
- subtracts
* multiplies
/ divides
^ performs exponential mathematics
% gives the modulo
++ adds one to the value of a variable
+= assigns the result of an addition operation to a variable
— subtracts one from a variable
-= assigns the result of a subtraction operation to a variable
*= assigns the result of multiplication
/= assigns the result of division
%= assigns the result of a modulo operation
$ cat inventory
For example, assume the following file(2.txt) exists on your machine detailing the inventory(存货清单) in a hardware store(五金店):

hammers(铁锤) 5 7.99
drills 2 29.99
punches 7 3.59
drifts 2 4.09
bits 55 1.19
saws(锯子) 123 14.99
nails(钉子) 800 .19
screws(螺丝钉) 80 .29
brads(曲头钉) 100 .24
$
The first order of business is to compute the value of each item's inventory by multiplying(乘) the value of the second field (quantity(量,数量)) by the value of the third field (price):
$awk '
{print "name:",$1, "count",$2,"the price",$3 "the total",$2*$3}' 2.txt
the naem:#!/bin/bashthe count the price the total 0
the naem:hammersthe count 5 the price 7.99 the total 39.95
the naem:drillsthe count 2 the price 29.99 the total 59.98
the naem:punchesthe count 7 the price 3.59 the total 25.13
the naem:driftsthe count 2 the price 4.09 the total 8.18
the naem:bitsthe count 55 the price 1.19 the total 65.45
the naem:sawsthe count 123 the price 14.99 the total 1843.77
the naem:nailsthe count 800 the price 0.19 the total 152
the naem:screwsthe count 80 the price 0.29 the total 23.2
the naem:bradsthe count 100 the price 0.24 the total 24

If the lines themselves are unimportant, and you want only to determine exactly how many items are in the store, you can assign a generic variable to increment by the number of items in each record:

$ awk '
{x=x+$2} {print x}' 2.txt
5
7
14
16
71
194
994
1074
1174
$
The same process can be applied to determining the total value of the inventory on hand:

$ awk '
{x=x+($2*$3)} {print x}' 2.txt
$ awk '
{x=x+($2*$3)}{print $1,"QTY: "$2,"PRICE: "$3,"TOTAL: "$2*$3,"BAL: "x}' 2.txt

BEGIN and END

Actions can be specified to take place prior to the actual start of processing or after it has been completed with BEGIN and END statements respectively. BEGIN statements are most commonly used to establish variables or display a header. END statements, on the other hand, can be used to continue processing after the program has finished.

In an earlier example, a complete value of the inventory was generated with the routine

awk '
{x=x+($2*$3)} {print x}' inventory

This routine provided a display for each line in the file as the running total accumulated(积聚,堆积). There was no other way to specify it, and not having it print at each line would have resulted in it never printing. With an END statement, however, this problem can be circumvented(绕过):

$ awk '
{x=x+($2*$3)} END {print "Total Value of Inventory: "x}' 2.txt
Total Value of Inventory: 2241.66
$
记住:begin and end 必须是大写字母。
The variable x is defined, and it processes for each line; however, no display is generated until all processing has completed. While it'
s useful as a standalone routine, it an also be put with the earlier listing to add even more information and a more complete report:

$ awk '{x=x+($2*$3)} {print $1,"QTY: "$2,"PRICE:
    "$3,"TOTAL: "$2*$3} END {print "Total Value of Inventory: " x}'
2.txt
The BEGIN command words in the same fashion as END, but it establishes items that need to be done before anything else is accomplished. The most common purpose of this procedure is to create headers for reports. The syntax for this routine would resemble

$awk 'BEGIN {print "ITEM QUANTITY PRICE TOTAL"}{x=x+$2*$3} {print $1"\t",$2"\t",$3"\t",$2*$3}END {print "Total Value of Inventory: " x}' 2.txt
ITEM QUANTITY PRICE TOTAL
#!/bin/bash 0
hammers 5 7.99 39.95
drills 2 29.99 59.98
punches 7 3.59 25.13
drifts 2 4.09 8.18
bits 55 1.19 65.45
saws 123 14.99 1843.77
nails 800 0.19 152
screws 80 0.29 23.2
brads 100 0.24 24
Total Value of Inventory: 2241.66

Input, Output, and Source Files

The AWK tool can read its input from a file, as was done in all examples up to this point, or it can take input from the output of another command. For example:

$ sort emp_names | awk '{print $3,$2}'

The input of the awk command is the output from the sort operation. In addition to sort, any other Linux command can be used — for example, grep. This procedure allows you to perform other operations on the file before pulling out selected fields.

Like the shell, AWK uses the output-redirection operators > and >> to put its output into a file rather than to standard output. The symbols react like their counterparts in the shell, so > creates the file if it doesn't exist, and >> appends to the existing file. Examine the following example:

$ awk '
{print NR, $1 > "/tmp/filez" }' emp_names
$ cat /tmp/filez
1    46012
2    46013
3    46015
4    46017
5    46018
6    46019
7    46021
8    46022
9    46024
10    46026
11    46027
12    46029
$
note:这重定向的文件必须用双引号引起来。

Examining the syntax of the statement, you can see that the output redirection is done after the print statement is complete. You must enclose the file name in quotes, or else it is simply an uninitialized AWK variable, and the combination of instructions generates an error from AWK. (If you use the redirection symbols improperly(错误的,不适合的), AWK gets confused about whether the symbol means "redirection" or is a relation operator.)

Output into pipes in AWK also resembles(相似的,类似) the way the same action would be accomplished in a shell. To send the output of a print command into a pipe, follow the print command with a pipe symbol and the name of the command, as in the following:

$ awk '
{ print $2 | "sort -n" }' 2.txt
2
2
5
7
55
80
100
123
800
$
As was the case with output redirection, you must enclose the command in quotes, and the name of the pipe is the name of the command being executed.(这管道命令也需用双引号括起来)

you can use the printf command, such as:
awk '
{printf "%s %s\n",$1,$2}

阅读(1089) | 评论(0) | 转发(0) |
0

上一篇:linux下的sed command

下一篇:linux下的id command

给主人留下些什么吧!~~