Copyright (C) 2010, 2011 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Changes from 3.1.8 to 4.0.0
---------------------------
1. The special files /dev/pid, /dev/ppid, /dev/pgrpid and /dev/user are
now completely gone. Use PROCINFO instead.
2. The POSIX 2008 behavior for `sub' and `gsub' are now the default.
THIS CHANGES BEHAVIOR!!!!
- echo '11122211' |awk '{sub(/1{3}/,"")}1'
- 22211
3. The \s and \S escape sequences are now recognized in regular expressions.
- echo '111 222 11' |awk '{gsub(/\s/,"")}1'
- 11122211
4. The split() function accepts an optional fourth argument which is an array
to hold the values of the separators.
- echo '111-222|33' |awk '{split($0,a,/[-|]/,seps);print "a[1] = "a[1] RS "a[2] = "a[2] RS "a[3] = "a[3] RS "spes[1] = "seps[1] RS "speS[2] = "seps[2]}'
- a[1] = 111
- a[2] = 222
- a[3] = 33
- spes[1] = -
- speS[2] = |
5. New -b / --characters-as-bytes option that means "hands off my data"; gawk
won't try to treat input as a multibyte string.
6. New --sandbox option; see the doc.
--sandbox
Disable the system() function, input redirections with getline, output redirections with print and printf, and dynamic extensions. This is particularly useful when you want to run awk scripts from questionable sources and need to make sure the scripts can't access your system (other than the specified input data file).
7. Indirect function calls are now available.
--With indirect function calls, you tell gawk to use the value of a variable as the name of the function to call.
8. Interval expressions are now part of default regular expressions for
GNU Awk syntax.
9. --gen-po is now correctly named --gen-pot.
10. switch / case is now enabled by default. There's no longer a need
for a configure-time option.
--Control flow in the switch statement works as it does in C.
- seq 10 |awk '{switch ($0%2){
- case "0":
- print "even number: "$0;break
- default:
- print "odd number: "$0
- }
- }'
- odd number: 1
- even number: 2
- odd number: 3
- even number: 4
- odd number: 5
- even number: 6
- odd number: 7
- even number: 8
- odd number: 9
- even number: 10
11. Gawk now supports BEGINFILE and ENDFILE. See the doc for details.
--The body of the BEGINFILE rules is executed just before gawk reads the first record from a file. FILENAME is set to the name of the current file, and FNR is set to zero.
--The ENDFILE rule is called when gawk has finished processing the last record in an input file. For the last input file, it will be called before any END rules. (这两个功能真的很酷,尤其是在处理多个文件时,如下面:)
- head f1 f2
- ==> f1 <==
- aaa
- bbb
- ccc
- ==> f2 <==
- aaa
- bbb
- ccc
- awk 'BEGIN{print"BEGIN: ---"}BEGINFILE{print "\nBEGINFILE: +++"}{print}ENDFILE{print"ENDFILE: +++\n"}END{print"END: ---"}' f1 f2
- BEGIN: ---
- BEGINFILE: +++
- aaa
- bbb
- ccc
- ENDFILE: +++
- BEGINFILE: +++
- aaa
- bbb
- ccc
- ENDFILE: +++
- END: ---
12. Directories named on the command line now produce a warning, not
a fatal error, unless --posix or --traditional.
13. The new FPAT variable allows you to specify a regexp that matches
the fields, instead of matching the field separator. The new patsplit()
function gives the same capability for splitting.
--The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field.
- echo '111-222|33' |awk -vFS="[-|]" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
- $1 = 111
- $2 = 222
- $3 = 33
- #如果用FPAT呢?
- echo '111-222|33' |awk -vFPAT="[^-|]+" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
- $1 = 111
- $2 = 222
- $3 = 33
14. All long options now have short options, for use in `#!' scripts.
15. Support for IPv6 added via /inet6/... special file. /inet4/... forces
IPv4 and /inet chooses the system default (probably IPv4).
16. Added a warning for /[:space:]/ that should be /[[:space:]]/.
17. Merged with John Haque's byte code internals. Adds dgawk debugger and
possibly improved performance.
18. `break' and `continue' are no longer valid outside a loop, even with
--traditional.
19. POSIX character classes work with --traditional (BWK awk supports them).
20. Nuked redundant --compat, --copyleft, and --usage long options.
21. Arrays of arrays added. See the doc. (这个更强!)
- awk 'BEGIN{arr["a"]["b"]=1;arr["a"]["c"]=2;
- for( i in arr)
- for( j in arr[i])
- print i,j,arr[i][j]
- }'
- a b 1
- a c 2
22. Per the GNU Coding Standards, dynamic extensions must now define
a global symbol indicating that they are GPL-compatible. See
the documentation and example extensions.
THIS CHANGES BEHAVIOR!!!!
23. In POSIX mode, string comparisons use strcoll/wcscoll.
THIS CHANGES BEHAVIOR!!!!
24. The option for raw sockets was removed, since it was never implemented.
25. If not in POSIX mode, gawk turns ranges of the form [d-h] into
[defgh] before compiling a regexp. Maybe this will stop all the
questions about [a-z] matching uppercase letters.
THIS CHANGES BEHAVIOR!!!!
26. PROCINFO["strftime"] now holds the default format for strftime().
27. Updated to latest infrastructure: Autoconf 2.68, Automake 1.11.1,
Gettext 0.18.1, Bison 2.5.
28. Many code cleanups. Removed code for many old, unsupported systems:
- Atari
- Amiga
- BeOS
- Cray
- MIPS RiscOS
- MS-DOS with Microsoft Compiler
- MS-Windows with Microsoft Compiler
- NeXT
- SunOS 3.x, Sun 386 (Road Runner)
- Tandem (non-POSIX)
- Prestandard VAX C compiler for VAX/VMS
- Probably others that I've forgotten
29. If PROCINFO["sorted_in"] exists, for(iggy in foo) loops sort the
indices before looping over them. The value of this element
provides control over how the indices are sorted before the loop
traversal starts. See the manual.
30. A new isarray() function exists to distinguish if an item is an array
or not, to make it possible to traverse multidimensional arrays.
31. asort() and asorti() take a third argument specifying how to sort.
See the doc.
--