Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1093266
  • 博文数量: 242
  • 博客积分: 10209
  • 博客等级: 上将
  • 技术积分: 3028
  • 用 户 组: 普通用户
  • 注册时间: 2008-03-12 09:27
文章分类

全部博文(242)

文章存档

2014年(1)

2013年(1)

2010年(51)

2009年(65)

2008年(124)

我的朋友

分类:

2008-12-07 15:15:13

How do I use shell variables in awk scripts
转自
   Short answer = either of these, where "svar" is a shell variable
   and "avar" is an awk variable:

        awk -v avar="$svar" '... avar ...' file
        awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}... avar ...' "$svar" file

   depending on your requirements for handling backslashes and
   handling ARGV[] if it contains a null string (see below for details).

   Long answer = There are several ways of passing the values of
   shell variables to awk scripts depending on which version of awk
   (and to a much lesser extent which OS) you're using. For this
   discussion, we'll consider the following 4 awk versions:

   oawk (old awk, /usr/bin/awk and /usr/bin/oawk on Solaris)
   nawk (new awk, /usr/bin/nawk on Solaris)
   sawk (non-standard name for /usr/xpg4/bin/awk on Solaris)
   gawk (GNU awk, downloaded from [url])[/url]

   If you wanted to find all lines in a given file that match text
   stored in a shell variable "svar" then you could use one of the
   following:

   a) awk -v avar="$svar" '$0 == avar' file
   b) awk -vavar="$svar" '$0 == avar' file
   c) awk '$0 == avar' avar="$svar" file
   d) awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}$0 == avar' "$svar" file
   e) awk 'BEGIN{avar=ARGV[1];ARGC--}$0 == avar' "$svar" file
   f) svar="$svar" awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file
   g) awk '$0 == '"$svar"'' file

   The following list shows which version is supported by which
   awk on Solaris (which should also apply to most other OSs):

        oawk = c, g
        nawk = a, c, d, f, g
        sawk = a, c, d, f, g
        gawk = a, b, c, d, f, g

   Notes:

   1) Old awk only works with forms "c" and "g", both of which have
      problems.

   2) GNU awk is the only one that works with form "b" (no space
      between "-v" and "var="). Since gawk also supports form "a",
      as do all the other new awks, you should avoid form "b" for
      portability between newer awks.

   3) In form "c", ARGV[1] is still getting populated, but
      because it contains an equals sign (=), awk changes it's normal
      behavior of assuming that arguments are file names and now instead
      assumes this is a variable assignment so you don't need to clear
      ARGV[1] as in form "d".

   4) In light of "3)" above, this raises the interesting question of
      how to pass awk a file name that contains an equals sign - the
      answer is to do one of the following:

       i) Specify a path, e.g. for a file named "abc=def" in the
          current directory, you'd use:

                awk '...' ./abc=def

          Note that that won't work with older versions of gawk or with
          sawk.

      ii) Redirect the input from a file so it's opend by the shell
          rather than awk having to parse the file name as an argument
          and then open it:

                awk '...' < abc=def

          Note that you will not have access to the file name in the
          FILENAME variable in this case.

   5) An alternative to setting ARGV[1]="" in form "d" is to delete
      that array entry, e.g.:

        awk 'BEGIN{avar=ARGV[1];delete ARGV[1]}$0 == avar' "$svar" file

      This is slightly misleading, however since although ARGV[1]
      does get deleted in the BEGIN section and remains deleted
      for any files that preceed the deleted variable assignment,
      the ARGV[] entry is recreated by awk when it gets to that
      argument during file processing, so in the case above when
      parsing "file", ARGV[1] would actually exist with a null
      string value just like if you'd done ARGV[1]="". Given that

      it's misleading and introduces inconsistency of ARGV[]
      settings between files based on command-line order, it is
      not recommended.

   6) An alternative to setting svar="$svar" on the command line
      prior to invoking awk in form "f" is to export svar first,
      e.g.:

        export svar
        awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file

      Since this forces you to export variables that you wouldn't
      normally export and so risk interfering with the environment
      of other commands invoked from your shell, it is not recommended.

   7) When you use form "d", you end up with a null string in
      ARGV[1], so if at the end of your program you want to print
      out all the file names then instead of doing:

        END{for (i in ARGV) print ARGV[i]}

      you need to check for a null string before printing. or
      store FILENAMEs in a different array during processing.
      Note that the above loop as written would also print the
      script name stored in ARGV[0].

   8) When you use form "a", "b", or "c", the awk variable
      assignment gets processed during awks lexical analaysis
      stage (i.e. when the internal awk program gets built) and
      any backslashes present in the shell variable may get
      expanded so, for example, if svar contains "hi\there"
      then avar could contain "hithere" with a literal tab
      character. This behavior depends on the awk version as
      follows:

      oawk: does not print a warning and sets avar="hi\there"
      sawk: does not print a warning and sets avar="hihere"
      nawk: does not print a warning and sets avar="hihere"
      gawk: does not print a warning and sets avar="hihere"

      If the backslash preceeds a character that has no
      special meaning to awk then the backslash may be discarded
      with or without a warning, e.g. if svar contained "hi\john"
      then the backslash preceeds "j" and "\j" has no special
      meaning so the various new awks each would behave differently
      as follows:

      oawk: does not print a warning and sets avar="hi\john"
      sawk: does not print a warning and sets avar="hi\john"
      nawk: does not print a warning and sets avar="hijohn"
      gawk: prints a warning and sets avar="hijohn"

   9) None of the awk versions discussed here work with form "e" but
      it is included above as there are older (i.e. pre-POSIX) versions
      of awk that will treat form "d" as if it's intended to access a
      file named "" so you instead need to use form "e". If you find
      yourself with that or any other version of "old awk", you need
      to get a new awk to avoid future headaches and they will not be
      discussed further here.

   So, the forms accepted by all 3 newer awks under discussion (nawk,
   sawk, and gawk) are a, c, d, f, and g. The main differences between
   each of these forms is as follows:

      |-------|-------|----------|-----------|-----------|--------|
      | BEGIN | files | requires |  accepts  |  expands  |  null  |
      | avail |  set  |  access  | backslash | backslash | ARGV[] |
      |-------|-------|----------|-----------|-----------|--------|
   a) |   y   |  all  |     n    |     n     |     y     |   n    |
   c) |   n   |  sub  |     n    |     n     |     y     |   n    |
   d) |   y   |  all  |     n    |     n     |     n     |   y    |
   f) |   y   |  all  |     y    |     n     |     n     |   n    |
   g) |   y   |  all  |     n    |     y     |    n/a    |   n    |
      |-------|-------|----------|-----------|-----------|--------|

   where the columns mean:

   BEGIN avail = y: variable IS available in the BEGIN section
   BEGIN avail = n: variable is NOT available in the BEGIN section

   files set = all: variable is set for ALL files regardless of
                command-line order.
   files set = sub: variable is ONLY set for those files subsequent
                to the definition of the variable on the command line

   requires access = y: variable DOES need to be exported or set on
                the command line
   requires access = n: shell variable does NOT need to be exported
                or set on the command line

   accepts backslash = y: variable CAN contain a backslash without
                causing awk to fail with a syntax error
   accepts backslash = n: variable can NOT contain a backslash without
                causing awk to fail with a syntax error

   expands backslash = y: if the variable contains a backslash, it IS
                expanded before execution begins
   expands backslash = n: if the variable contains a backslash, it is
                NOT expanded before execution begins

   null ARGV[] = y: you DO end up with a null entry in the ARGV[]
        array
   null ARGV[] = n: you do NOT end up with a null entry in the ARGV[]
        array

   For most applications, form "a" and "d" provide the most intuitive
   functionality. The only functional differences between the 2 are:

   1) Whether or not backslashes get expanded on variable assignment.
   2) Whether or not ARGV[] ends up containing a null string.

   so which one you choose to use depends on your requirements for
   these 2 situations.
阅读(1399) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~