2011-02-06 01:11:20

源:25 Best AWK Commands / Tricks

1)  List of commands you use most often

history | awk '{a[$2]++} END {for (i in a){print a[i]" "i} }' | sort -nr | head
2) Display a block of text with AWK

awk '/start_pattern
/,/stop_patern/' file.txt

I find this terribly useful for grepping through a file, looking for just a block of text. There’s “grep -A # pattern file.txt” to see a specific number of lines following your pattern, but what if you want to see the whole block? Say, the output of “dmidecode” (as root):
dmidecode | awk '/Battery/,/^$/'Will show me everything following the battery block up to the next block of text. Again, I find this extremely useful when I want to see whole blocks of text based on a pattern, and I don’t care to see the rest of the data in output. This could be used against the ‘/etc/securetty/user’ file on Unix to find the block of a specific user. It could be used against VirtualHosts or Directories on Apache to find specific definitions. The scenarios go on for any text formatted in a block fashion. Very handy.
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' file.txt
3) Graph # of connections for each hosts.
netstat -an | grep ESTABLISHED | awk '{print $5}' | awk -F: '{print $1}' | sort | uniq -c | awk '{printf ("%s\t%s\t",$2,$1); for (i=0;i < $1;i++){print "*"};print " "}'

Written for linux, the real example is how to produce ascii text graphs based on a numeric value (anything where uniq -c is useful is a good candidate).
4) Check your unread Gmail from the command line

curl -u username:password –silent “” | tr -d ‘\n’ | awk -F ” ‘{for (i=2; i<=NF; i++) {print $i}}’ | sed -n “s/\(.*\)<\/title.*name>\(.*\)<\/name>.*/\2 – \1/p”

Checks the Gmail ATOM feed for your account, parses it and outputs a list of unread messages.

For some reason sed gets stuck on OS X, so here’s a Perl version for the Mac:

curl -u username:password --silent "" | tr -d '\n' | awk -F '' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^(.*)<\/title>.*<name>(.*)<\/name>.*$/$2 - $1/'If you want to see the name of the last person, who added a message to the conversation, change the greediness of the operators like this:

curl -u username:password --silent "" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^<title>(.*)<\/title>.*?<name>(.*?)<\/name>.*$/$2 - $1/'
5) Remove duplicate entries in a file without sorting.

awk '!x[$0]++' <file>

Using awk, find duplicates in a file without sorting, which reorders the contents. awk will not reorder them, and still find and remove duplicates which you can then redirect into another file.
删除重复行。
6) find geographical location of an ip address

lynx -dump |grep address|egrep 'city|state|country'|awk '{print $3,$4,$5,$6,$7,$8}'|sed 's\ip address flag \\'|sed 's\My\\'

I save this to bin/iptrace and run "iptrace ipaddress" to get the Country, City and State of an ip address using the service.

I add the following to my script to get a tinyurl of the map as well:

URL=`lynx -dump details|awk '{print $2}'`

lynx -dump tinyurl|grep "19. http"|awk '{print $2}'

7) Block known dirty hosts from reaching your machine

wget -qO - |awk '!/#|[a-z]/&&/./{print "iptables -A INPUT -s "$1″ -j DROP"}'

Blacklisted is a compiled list of all known dirty hosts (botnets, spammers, bruteforcers, etc.) which is updated on an hourly basis. This command will get the list and create the rules for you, if you want them automatically blocked, append |sh to the end of the command line. It’s a more practical solution to block all and allow in specifics however, there are many who don’t or can’t do this which is where this script will come in handy. For those using ipfw, a quick fix would be {print “add deny ip from “$1″ to any}. Posted in the sample output are the top two entries. Be advised the blacklisted file itself filters out RFC1918 addresses (10.x.x.x, 172.16-31.x.x, 192.168.x.x) however, it is advisable you check/parse the list before you implement the rules</p> 8) Display a list of committers sorted by the frequency of commits <p><strong>svn log -q|grep “|”|awk “{print \$3}”|sort|uniq -c|sort -nr</strong></p> <p>Use this command to find out a list of committers sorted by the frequency of commits.</p> 9) List the number and type of active network connections<br><br><div style="display: inline ! important;"><strong><span style="color: rgb(241, 79, 154);">netstat -ant | awk '{print $NF}' | grep -v '[a-z]' | sort | uniq -c</span></strong></div> <div><strong><span style="color: rgb(54, 101, 238);">查看tcp/udp等的活动连接数</span></strong></div> <strong>10)</strong> View facebook friend list [hidden or not hidden] <p><strong>lynx -useragent=Opera -dump ‘′ |gawk -F’\”t\”:\”‘ -v RS=’\”,’ ‘RT{print $NF}’ |grep -v ‘\”n\”:\”‘ |cut -d, -f2</strong></p> <div> <p>There’s no need to be logged in facebook. I could do more JSON filtering but you get the idea…</p> <p>Replace u=4 (Mark Zuckerberg, Facebook creator) with desired uid.</p> <p>Hidden or not hidden… Scary, don’t you?</p> 11) List recorded formular fields of Firefox <p><strong>cd ~/.mozilla/firefox/ && sqlite3 `cat profiles.ini | grep Path | awk -F= ‘{print $2}’`/formhistory.sqlite “select * from moz_formhistory” && cd – > /dev/null</strong></p> <p>When you fill a formular with Firefox, you see things you entered in previous formulars with same field names. This command list everything Firefox has registered. Using a “delete from”, you can remove anoying Google queries, for example ;-)</p> 12) Brute force discover <p><strong>sudo zcat /var/log/auth.log.*.gz | awk ‘/Failed password/&&!/for invalid user/{a[$9]++}/Failed password for invalid user/{a["*" $11]++}END{for (i in a) printf “%6s\t%s\n”, a[i], i|”sort -n”}’</strong></p> <p>Show the number of failed tries of login per account. If the user does not exist it is marked with *.</p> 13) Show biggest files/directories, biggest first with ‘k,m,g’ eyecandy<br><br><div><strong><span style="color: rgb(241, 79, 154);">du --max-depth=1 | sort -rn | awk '{split("k m g",v); s=1 ; while($1>1024){$1/=1024;s++} print int($1)" "v[s]"\t"$2}'</span></strong></div> <div>I use this on debian testing, works like the other sorted du variants, but i like small numbers and suffixes :)</div> <div><span style="color: rgb(54, 101, 238);">显示使用最大的目录。du显示的是单位是k,使用awk对数据单位进行处理。</span> </div> <div><span style="color: rgb(54, 101, 238);">对比另外一条类型的awk+sed命令<br><br></span></div> <div><span style="color: rgb(229, 0, 255);">du $1 --max-depth=1 | sort -n|awk '{printf "%7.2fM ----> %s\n",$1/1024,$2}'|sed 's:/.*/\([^/]\{1,\}\)$:\1:g' </span></div> <div><span style="color: rgb(45, 79, 201);">只能以m为单位。  </span> </div> <div><span style="color: rgb(45, 79, 201);">来自</span><span style="color: rgb(45, 79, 201);"> 第50条</span></div> 14) Analyse an Apache access log for the most common IP addresses<br><br><div title="Click to select this command"> <div><strong><span style="color: rgb(54, 101, 238);">tail -10000 access_log | awk ‘{print $1}’ | sort | uniq -c | sort -n | tail</span></strong></div> <div><strong><br clear="none"></strong></div> <div>This uses awk to grab the IP address from each request and then sorts and summarises the top 10</div> 15) copy working directory and compress it on-the-fly while showing progress <p><strong>tar -cf – . | pv -s $(du -sb . | awk ‘{print $1}’) | gzip > out.tgz</strong></p> <div> <p>What happens here is we tell tar to create “-c” an archive of all files in current dir “.” (recursively) and output the data to stdout “-f -”. Next we specify the size “-s” to pv of all files in current dir. The “du -sb . | awk ?{print $1}?” returns number of bytes in current dir, and it gets fed as “-s” parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way “pv” knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.</p> <p>Credit: Peteris Krumins <a shape="rect" rel="nofollow" href="" target="_blank"></a></p> 16) List of commands you use most often<br><br><div><strong><span style="color: rgb(241, 79, 154);">history | awk '{print $2}' | sort | uniq -c | sort -rn | head</span></strong></div> <strong>17)</strong> Identify long lines in a file <p><strong><span style="color: rgb(241, 79, 154);">awk ‘length>72′ file</span></strong></p> <div>This command displays a list of lines that are longer than 72 characters. I use this command to identify those lines in my scripts and cut them short the way I like it.</div> <div><span style="color: rgb(54, 101, 238);">输出单行超过72个字符(包含空格)的行。 awk 'length>=72' file 输出单行大等于72个字符的所有行。</span></div> <div><br clear="none"></div> 18) Makes you look busy <p><strong>alias busy=’my_file=$(find /usr/include -type f | sort -R | head -n 1); my_len=$(wc -l $my_file | awk “{print $1}”); let “r = $RANDOM % $my_len” 2>/dev/null; vim +$r $my_file’</strong></p> <p>This makes an alias for a command named ‘busy’. The ‘busy’ command opens <span style="color: rgb(50, 135, 18);">a random file</span> in /usr/include to a random line with vim. Drop this in your .bash_aliases and make sure that file is initialized in your .bashrc.</p> 19) Show me a histogram of the busiest minutes in a log file:<br><br><div style="display: inline ! important;"><strong><span style="color: rgb(241, 79, 154);">sudo head -1 /var/log/secure | awk '{print substr($0,0,12)}' | uniq -c | sort -nr | awk '{printf("\n%s",$0); for (i=0; i<$1; i++){print (" *")};}'</span></strong></div> <div><span style="color: rgb(54, 101, 238);">用柱状图显示登录安全日志里面最忙碌的时刻。substr($0,0,12)取每行的前12个字符(包含空格)。第一个*不整齐,</span><span style="color: rgb(255, 0, 0);"><strong>命令需要改进。</strong></span></div> <strong>20)</strong> Analyze awk fields <p><strong>awk ‘{print NR”: “$0; for(i=1;i<=NF;++i)print “\t”i”: “$i}’</strong></p> <p>Breaks down and numbers each line and it’s fields. This is really useful when you are going to parse something with awk but aren’t sure exactly where to start.</p> 21) Browse system RAM in a human readable form <p><strong>sudo cat /proc/kcore | strings | awk ‘length > 20′ | less</strong></p> <p>This command lets you see and scroll through all of the strings that are stored in the RAM at any given time. Press space bar to scroll through to see more pages (or use the arrow keys etc).</p> <p>Sometimes if you don’t save that file that you were working on or want to get back something you closed it can be found floating around in here!</p> <p>The awk command only shows lines that are longer than 20 characters (to avoid seeing lots of junk that probably isn’t “human readable”).</p> <p>If you want to dump the whole thing to a file replace the final ‘| less’ with ‘> memorydump’. This is great for searching through many times (and with the added bonus that it doesn’t overwrite any memory…).</p> <p>Here’s a neat example to show up conversations that were had in pidgin (will probably work after it has been closed)…</p> <p><strong>sudo cat /proc/kcore | strings | grep '([0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\})'</strong>(depending on sudo settings it might be best to run</p> <p><strong>sudo su</strong>first to get to a # prompt)</p> 22) Monitor open connections for httpd including listen, count and sort it per IP <div><strong><span style="color: rgb(229, 0, 255);"><br>watch "netstat -plan | grep :80 | awk '{print $5}' | cut -d: -f 1 | sort | uniq -c | sort -nk 1"</span></strong></div> <div> <p>It’s not my code, but I found it useful to know how many open connections per request I have on a machine to debug connections without opening another http connection for it.</p> <div>You can also decide to sort things out differently then the way it appears in here.</div> <div><span style="color: rgb(54, 101, 238);">实时监控某个端口(如80端口)的连接情况。</span></div> <div><br clear="none"></div> 23) Purge configuration files of removed packages on debian based systems <p><strong>sudo aptitude purge `dpkg –get-selections | grep deinstall | awk ‘{print $1}’`</strong></p> <div> <p>Purge all configuration files of removed packages</p> 24) Quick glance at who’s been using your system recently <div><strong><span style="color: rgb(229, 0, 255);"><br>last | grep -v "^$" | awk '{print $1}' | sort -nr | uniq -c</span></strong></div> <p>This command takes the output of the ‘last’ command, removes empty lines, gets just the first field ($USERNAME), sort the $USERNAMES in reverse order and then gives a summary count of unique matches.</p> 25) Number of open connections per ip. <div><strong><span style="color: rgb(229, 0, 255);"><br>netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n</span></strong></div> <div> <div>Here is a command line to run on your server if you think your server is under attack. It prints our a list of open connections to your server and sorts them by amount.</div> <div><span style="color: rgb(54, 101, 238);">前2行的列信息栏的信息没有过滤掉。</span><span style="color: rgb(255, 0, 0);"><strong>命令需要改进。</strong></span></div> <p>BSD Version:</p> </div> <div><strong>netstat -na |awk '{print $5}' |cut -d "." -f1,2,3,4 |sort |uniq -c |sort -nr</strong></div> <div>And there you have it killer awk usages. Now I know you might be thinking these are NOT awk commands. 