In awk, array can be very dynamic. It looks more like a hash table. The index of array can be a number of a string. We call such an array "Associative array". In fact, the number is converted to a string when it is handled.
Let's look at an example. It counts the lines' number which contains "widget"
/widget/ {count[widget]++} END {print count["widget"]}
We can use a special loop to read all members of the array
for(item in array) process array[item]
Or we can test whether an item exists in an array
if(item in array)
Look at another example where the array shows its powerful functions.
The parsing file reads like: 09:55:54: ERROR1 /tmp/error/log.3 50 times 09:56:09: ERROR1 /tmp/error/log.14 50 times 10:56:12: ERROR1 /tmp/error/log.14 100 times 10:56:23: ERROR2 /tmp/error/log.5 50 times 11:56:26: ERROR2 /tmp/error/log.1 50 times 11:56:27: ERROR2 /tmp/error/log.5 100 times 15:56:29: ERROR3 /tmp/error/log.1 100 times 15:56:32: ERROR3 /tmp/error/log.1 150 times 16:56:33: ERROR4 /tmp/error/log.6 50 times 16:56:36: ERROR4 /tmp/error/log.6 100 times 16:56:40: ERROR4 /tmp/error/log.12 50 times
And we want to collect how many errors take place each hour. If we don't use array, the code will read like the following.
awk -F'[: ]+' 'BEGIN {timeframe="";count=0} { if($1 != timeframe) { if(timeframe != "") { print count " errors take place at " timeframe " } timeframe = $1 count = 1 } else count++ } END {print count " errors take place at " timeframe}
While it can be more simpler, if array is used instead awk -F'[: ]+' '{count[$1]++} END {for(i in count) print count[i] " errors take place at " i}'
But from the code performance, the first one should be more quick and consumes less