How do I use shell variables in awk scripts(ZZ)-wqfhenanxc-ChinaUnix博客

wqfhenanxc

首页　| 　博文目录　| 　关于我

wqfhenanxc

博客访问： 1074270
博文数量： 242
博客积分： 10209
博客等级：上将
技术积分： 3028
用户组：普通用户
注册时间： 2008-03-12 09:27

文章分类

全部博文（242）

点滴（2）
数据库相关（3）
Java学习（1）
windows编程（2）
P2P相关（1）
网络安全（3）
汇编语言（3）
unix网络编程（20）
学习C++（26）
思想人生（22）
英语学习（1）
linux系统（30）
history of weste（0）
社会人文（0）
linux c编程（63）
算法（36）

我读算法之美（1）
shell编程（28）
未分配的博文（1）

文章存档

2014年（1）

2013年（1）

2010年（51）

2009年（65）

2008年（124）

我的朋友

最近访客

推荐博文

How do I use shell variables in awk scripts(ZZ)

分类：

2008-12-07 15:15:13

How do I use shell variables in awk scripts
转自
Short answer = either of these, where "svar" is a shell variable
and "avar" is an awk variable:

      awk -v avar="$svar" '... avar ...' file
      awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}... avar ...' "$svar" file

depending on your requirements for handling backslashes and
handling ARGV[] if it contains a null string (see below for details).

Long answer = There are several ways of passing the values of
shell variables to awk scripts depending on which version of awk
(and to a much lesser extent which OS) you're using. For this
discussion, we'll consider the following 4 awk versions:

oawk (old awk, /usr/bin/awk and /usr/bin/oawk on Solaris)
nawk (new awk, /usr/bin/nawk on Solaris)
sawk (non-standard name for /usr/xpg4/bin/awk on Solaris)
gawk (GNU awk, downloaded from [url])[/url]

If you wanted to find all lines in a given file that match text
stored in a shell variable "svar" then you could use one of the
following:

a) awk -v avar="$svar" '$0 == avar' file
b) awk -vavar="$svar" '$0 == avar' file
c) awk '$0 == avar' avar="$svar" file
d) awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}$0 == avar' "$svar" file
e) awk 'BEGIN{avar=ARGV[1];ARGC--}$0 == avar' "$svar" file
f) svar="$svar" awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file
g) awk '$0 == '"$svar"'' file

The following list shows which version is supported by which
awk on Solaris (which should also apply to most other OSs):

      oawk = c, g
      nawk = a, c, d, f, g
      sawk = a, c, d, f, g
      gawk = a, b, c, d, f, g

Notes:

1) Old awk only works with forms "c" and "g", both of which have
   problems.

2) GNU awk is the only one that works with form "b" (no space
   between "-v" and "var="). Since gawk also supports form "a",
   as do all the other new awks, you should avoid form "b" for
   portability between newer awks.

3) In form "c", ARGV[1] is still getting populated, but
   because it contains an equals sign (=), awk changes it's normal
   behavior of assuming that arguments are file names and now instead
   assumes this is a variable assignment so you don't need to clear
   ARGV[1] as in form "d".

4) In light of "3)" above, this raises the interesting question of
   how to pass awk a file name that contains an equals sign - the
   answer is to do one of the following:

   i) Specify a path, e.g. for a file named "abc=def" in the
      current directory, you'd use:

            awk '...' ./abc=def

      Note that that won't work with older versions of gawk or with
      sawk.

   ii) Redirect the input from a file so it's opend by the shell
      rather than awk having to parse the file name as an argument
      and then open it:

            awk '...' < abc=def

      Note that you will not have access to the file name in the
      FILENAME variable in this case.

5) An alternative to setting ARGV[1]="" in form "d" is to delete
   that array entry, e.g.:

      awk 'BEGIN{avar=ARGV[1];delete ARGV[1]}$0 == avar' "$svar" file

   This is slightly misleading, however since although ARGV[1]
   does get deleted in the BEGIN section and remains deleted
   for any files that preceed the deleted variable assignment,
   the ARGV[] entry is recreated by awk when it gets to that
   argument during file processing, so in the case above when
   parsing "file", ARGV[1] would actually exist with a null
   string value just like if you'd done ARGV[1]="". Given that

   it's misleading and introduces inconsistency of ARGV[]
   settings between files based on command-line order, it is
   not recommended.

6) An alternative to setting svar="$svar" on the command line
   prior to invoking awk in form "f" is to export svar first,
   e.g.:

      export svar
      awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file

   Since this forces you to export variables that you wouldn't
   normally export and so risk interfering with the environment
   of other commands invoked from your shell, it is not recommended.

7) When you use form "d", you end up with a null string in
   ARGV[1], so if at the end of your program you want to print
   out all the file names then instead of doing:

      END{for (i in ARGV) print ARGV[i]}

   you need to check for a null string before printing. or
   store FILENAMEs in a different array during processing.
   Note that the above loop as written would also print the
   script name stored in ARGV[0].

8) When you use form "a", "b", or "c", the awk variable
   assignment gets processed during awks lexical analaysis
   stage (i.e. when the internal awk program gets built) and
   any backslashes present in the shell variable may get
   expanded so, for example, if svar contains "hi\there"
   then avar could contain "hithere" with a literal tab
   character. This behavior depends on the awk version as
   follows:

   oawk: does not print a warning and sets avar="hi\there"
   sawk: does not print a warning and sets avar="hihere"
   nawk: does not print a warning and sets avar="hihere"
   gawk: does not print a warning and sets avar="hihere"

   If the backslash preceeds a character that has no
   special meaning to awk then the backslash may be discarded
   with or without a warning, e.g. if svar contained "hi\john"
   then the backslash preceeds "j" and "\j" has no special
   meaning so the various new awks each would behave differently
   as follows:

   oawk: does not print a warning and sets avar="hi\john"
   sawk: does not print a warning and sets avar="hi\john"
   nawk: does not print a warning and sets avar="hijohn"
   gawk: prints a warning and sets avar="hijohn"

9) None of the awk versions discussed here work with form "e" but
   it is included above as there are older (i.e. pre-POSIX) versions
   of awk that will treat form "d" as if it's intended to access a
   file named "" so you instead need to use form "e". If you find
   yourself with that or any other version of "old awk", you need
   to get a new awk to avoid future headaches and they will not be
   discussed further here.

So, the forms accepted by all 3 newer awks under discussion (nawk,
sawk, and gawk) are a, c, d, f, and g. The main differences between
each of these forms is as follows:

   |-------|-------|----------|-----------|-----------|--------|
   | BEGIN | files | requires |  accepts  |  expands  |  null  |
   | avail |  set  |  access  | backslash | backslash | ARGV[] |
   |-------|-------|----------|-----------|-----------|--------|
a) | y |  all  |    n |    n    |    y    | n |
c) | n |  sub  |    n |    n    |    y    | n |
d) | y |  all  |    n |    n    |    n    | y |
f) | y |  all  |    y |    n    |    n    | n |
g) | y |  all  |    n |    y    | n/a | n |
   |-------|-------|----------|-----------|-----------|--------|

where the columns mean:

BEGIN avail = y: variable IS available in the BEGIN section
BEGIN avail = n: variable is NOT available in the BEGIN section

files set = all: variable is set for ALL files regardless of
            command-line order.
files set = sub: variable is ONLY set for those files subsequent
            to the definition of the variable on the command line

requires access = y: variable DOES need to be exported or set on
            the command line
requires access = n: shell variable does NOT need to be exported
            or set on the command line

accepts backslash = y: variable CAN contain a backslash without
            causing awk to fail with a syntax error
accepts backslash = n: variable can NOT contain a backslash without
            causing awk to fail with a syntax error

expands backslash = y: if the variable contains a backslash, it IS
            expanded before execution begins
expands backslash = n: if the variable contains a backslash, it is
            NOT expanded before execution begins

null ARGV[] = y: you DO end up with a null entry in the ARGV[]
      array
null ARGV[] = n: you do NOT end up with a null entry in the ARGV[]
      array

For most applications, form "a" and "d" provide the most intuitive
functionality. The only functional differences between the 2 are:

1) Whether or not backslashes get expanded on variable assignment.
2) Whether or not ARGV[] ends up containing a null string.

so which one you choose to use depends on your requirements for
these 2 situations.

阅读(1362) | 评论(0) | 转发(0) |

上一篇：differences between awk and sed

下一篇：shell十三问之四双引号与单引号 ZZ

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6