How do I use shell variables in awk scripts
转自
Short answer = either of these, where "svar" is a shell variable
and "avar" is an awk variable:
awk -v avar="$svar" '... avar ...' file
awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}... avar ...' "$svar" file
depending on your requirements for handling backslashes and
handling ARGV[] if it contains a null string (see below for details).
Long answer = There are several ways of passing the values of
shell variables to awk scripts depending on which version of awk
(and to a much lesser extent which OS) you're using. For this
discussion, we'll consider the following 4 awk versions:
oawk (old awk, /usr/bin/awk and /usr/bin/oawk on Solaris)
nawk (new awk, /usr/bin/nawk on Solaris)
sawk (non-standard name for /usr/xpg4/bin/awk on Solaris)
gawk (GNU awk, downloaded from [url])[/url]
If you wanted to find all lines in a given file that match text
stored in a shell variable "svar" then you could use one of the
following:
a) awk -v avar="$svar" '$0 == avar' file
b) awk -vavar="$svar" '$0 == avar' file
c) awk '$0 == avar' avar="$svar" file
d) awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}$0 == avar' "$svar" file
e) awk 'BEGIN{avar=ARGV[1];ARGC--}$0 == avar' "$svar" file
f) svar="$svar" awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file
g) awk '$0 == '"$svar"'' file
The following list shows which version is supported by which
awk on Solaris (which should also apply to most other OSs):
oawk = c, g
nawk = a, c, d, f, g
sawk = a, c, d, f, g
gawk = a, b, c, d, f, g
Notes:
1) Old awk only works with forms "c" and "g", both of which have
problems.
2) GNU awk is the only one that works with form "b" (no space
between "-v" and "var="). Since gawk also supports form "a",
as do all the other new awks, you should avoid form "b" for
portability between newer awks.
3) In form "c", ARGV[1] is still getting populated, but
because it contains an equals sign (=), awk changes it's normal
behavior of assuming that arguments are file names and now instead
assumes this is a variable assignment so you don't need to clear
ARGV[1] as in form "d".
4) In light of "3)" above, this raises the interesting question of
how to pass awk a file name that contains an equals sign - the
answer is to do one of the following:
i) Specify a path, e.g. for a file named "abc=def" in the
current directory, you'd use:
awk '...' ./abc=def
Note that that won't work with older versions of gawk or with
sawk.
ii) Redirect the input from a file so it's opend by the shell
rather than awk having to parse the file name as an argument
and then open it:
awk '...' < abc=def
Note that you will not have access to the file name in the
FILENAME variable in this case.
5) An alternative to setting ARGV[1]="" in form "d" is to delete
that array entry, e.g.:
awk 'BEGIN{avar=ARGV[1];delete ARGV[1]}$0 == avar' "$svar" file
This is slightly misleading, however since although ARGV[1]
does get deleted in the BEGIN section and remains deleted
for any files that preceed the deleted variable assignment,
the ARGV[] entry is recreated by awk when it gets to that
argument during file processing, so in the case above when
parsing "file", ARGV[1] would actually exist with a null
string value just like if you'd done ARGV[1]="". Given that
it's misleading and introduces inconsistency of ARGV[]
settings between files based on command-line order, it is
not recommended.
6) An alternative to setting svar="$svar" on the command line
prior to invoking awk in form "f" is to export svar first,
e.g.:
export svar
awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file
Since this forces you to export variables that you wouldn't
normally export and so risk interfering with the environment
of other commands invoked from your shell, it is not recommended.
7) When you use form "d", you end up with a null string in
ARGV[1], so if at the end of your program you want to print
out all the file names then instead of doing:
END{for (i in ARGV) print ARGV[i]}
you need to check for a null string before printing. or
store FILENAMEs in a different array during processing.
Note that the above loop as written would also print the
script name stored in ARGV[0].
8) When you use form "a", "b", or "c", the awk variable
assignment gets processed during awks lexical analaysis
stage (i.e. when the internal awk program gets built) and
any backslashes present in the shell variable may get
expanded so, for example, if svar contains "hi\there"
then avar could contain "hithere" with a literal tab
character. This behavior depends on the awk version as
follows:
oawk: does not print a warning and sets avar="hi\there"
sawk: does not print a warning and sets avar="hihere"
nawk: does not print a warning and sets avar="hihere"
gawk: does not print a warning and sets avar="hihere"
If the backslash preceeds a character that has no
special meaning to awk then the backslash may be discarded
with or without a warning, e.g. if svar contained "hi\john"
then the backslash preceeds "j" and "\j" has no special
meaning so the various new awks each would behave differently
as follows:
oawk: does not print a warning and sets avar="hi\john"
sawk: does not print a warning and sets avar="hi\john"
nawk: does not print a warning and sets avar="hijohn"
gawk: prints a warning and sets avar="hijohn"
9) None of the awk versions discussed here work with form "e" but
it is included above as there are older (i.e. pre-POSIX) versions
of awk that will treat form "d" as if it's intended to access a
file named "" so you instead need to use form "e". If you find
yourself with that or any other version of "old awk", you need
to get a new awk to avoid future headaches and they will not be
discussed further here.
So, the forms accepted by all 3 newer awks under discussion (nawk,
sawk, and gawk) are a, c, d, f, and g. The main differences between
each of these forms is as follows:
|-------|-------|----------|-----------|-----------|--------|
| BEGIN | files | requires | accepts | expands | null |
| avail | set | access | backslash | backslash | ARGV[] |
|-------|-------|----------|-----------|-----------|--------|
a) | y | all | n | n | y | n |
c) | n | sub | n | n | y | n |
d) | y | all | n | n | n | y |
f) | y | all | y | n | n | n |
g) | y | all | n | y | n/a | n |
|-------|-------|----------|-----------|-----------|--------|
where the columns mean:
BEGIN avail = y: variable IS available in the BEGIN section
BEGIN avail = n: variable is NOT available in the BEGIN section
files set = all: variable is set for ALL files regardless of
command-line order.
files set = sub: variable is ONLY set for those files subsequent
to the definition of the variable on the command line
requires access = y: variable DOES need to be exported or set on
the command line
requires access = n: shell variable does NOT need to be exported
or set on the command line
accepts backslash = y: variable CAN contain a backslash without
causing awk to fail with a syntax error
accepts backslash = n: variable can NOT contain a backslash without
causing awk to fail with a syntax error
expands backslash = y: if the variable contains a backslash, it IS
expanded before execution begins
expands backslash = n: if the variable contains a backslash, it is
NOT expanded before execution begins
null ARGV[] = y: you DO end up with a null entry in the ARGV[]
array
null ARGV[] = n: you do NOT end up with a null entry in the ARGV[]
array
For most applications, form "a" and "d" provide the most intuitive
functionality. The only functional differences between the 2 are:
1) Whether or not backslashes get expanded on variable assignment.
2) Whether or not ARGV[] ends up containing a null string.
so which one you choose to use depends on your requirements for
these 2 situations.
阅读(1399) | 评论(0) | 转发(0) |