Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5361404
  • 博文数量: 1144
  • 博客积分: 11974
  • 博客等级: 上将
  • 技术积分: 12312
  • 用 户 组: 普通用户
  • 注册时间: 2005-04-13 20:06
文章存档

2017年(2)

2016年(14)

2015年(10)

2014年(28)

2013年(23)

2012年(29)

2011年(53)

2010年(86)

2009年(83)

2008年(43)

2007年(153)

2006年(575)

2005年(45)

分类: LINUX

2009-07-11 23:23:58

Powerful awk "array"
In awk, array can be very dynamic. It looks more like a hash table. The index of array can be a number of a string. We call such an array "Associative array". In fact, the number is converted to a string when it is handled.

Let's look at an example. It counts the lines' number which contains "widget"

/widget/ {count[widget]++}       
END {print count["widget"]}

We can use a special loop to read all members of the array

for(item in array)
    process array[item]

Or we can test whether an item exists in an array

if(item in array)

Look at another example where the array shows its powerful functions.

The parsing file reads like:
09:55:54: ERROR1 /tmp/error/log.3 50 times
09:56:09: ERROR1 /tmp/error/log.14 50 times
10:56:12: ERROR1 /tmp/error/log.14 100 times
10:56:23: ERROR2 /tmp/error/log.5 50 times
11:56:26: ERROR2 /tmp/error/log.1 50 times
11:56:27: ERROR2 /tmp/error/log.5 100 times
15:56:29: ERROR3 /tmp/error/log.1 100 times
15:56:32: ERROR3 /tmp/error/log.1 150 times
16:56:33: ERROR4 /tmp/error/log.6 50 times
16:56:36: ERROR4 /tmp/error/log.6 100 times
16:56:40: ERROR4 /tmp/error/log.12 50 times

And we want to collect how many errors take place each hour. If we don't use array, the code will read like the following.

awk -F'[: ]+' 'BEGIN {timeframe="";count=0}
{
   if($1 != timeframe) {
      if(timeframe != "") {
          print count " errors take place at " timeframe "
      }
      timeframe = $1
      count = 1 
   }
   else
      count++
}
END {print count " errors take place at " timeframe}

While it can be more simpler, if array is used instead
awk -F'[: ]+' '{count[$1]++}
END {for(i in count) print count[i] " errors take place at " i}'

But from the code performance, the first one should be more quick and consumes less
阅读(2442) | 评论(0) | 转发(0) |
0

上一篇:BASH 数组用法小结

下一篇:IO::Zlib

给主人留下些什么吧!~~