Chinaunix首页 | 论坛 | 博客
  • 博客访问: 186963
  • 博文数量: 28
  • 博客积分: 648
  • 博客等级: 上士
  • 技术积分: 260
  • 用 户 组: 普通用户
  • 注册时间: 2010-12-17 05:33
文章分类

全部博文(28)

文章存档

2012年(1)

2011年(27)

分类: LINUX

2011-07-10 15:31:06

大家可以到GNU的ftp上下载下来爽一爽, ,粗略的看了下介绍,新版本的gawk功能更强大了!!!
下面是4.0.0版本gawk的一些新的features(测试了下部分功能):




QUOTE:
   Copyright (C) 2010, 2011 Free Software Foundation, Inc.

   Copying and distribution of this file, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.

Changes from 3.1.8 to 4.0.0
---------------------------

1. The special files /dev/pid, /dev/ppid, /dev/pgrpid and /dev/user are
   now completely gone. Use PROCINFO instead.

2. The POSIX 2008 behavior for `sub' and `gsub' are now the default.
   THIS CHANGES BEHAVIOR!!!!
  1. echo '11122211' |awk '{sub(/1{3}/,"")}1'
  2. 22211
3. The \s and \S escape sequences are now recognized in regular expressions.
  1. echo '111 222 11' |awk '{gsub(/\s/,"")}1'
  2. 11122211
4. The split() function accepts an optional fourth argument which is an array
   to hold the values of the separators.
  1. echo '111-222|33' |awk '{split($0,a,/[-|]/,seps);print "a[1] = "a[1] RS "a[2] = "a[2] RS "a[3] = "a[3] RS "spes[1] = "seps[1] RS "speS[2] = "seps[2]}'
  2. a[1] = 111
  3. a[2] = 222
  4. a[3] = 33
  5. spes[1] = -
  6. speS[2] = |
5. New -b / --characters-as-bytes option that means "hands off my data"; gawk
   won't try to treat input as a multibyte string.

6. New --sandbox option; see the doc.

   --sandbox

   Disable the system() function, input redirections with getline, output redirections with print and printf, and dynamic extensions. This is particularly useful when you want to run awk scripts from questionable sources and need to make sure the scripts can't access your system (other than the specified input data file).

7. Indirect function calls are now available.

   --With indirect function calls, you tell gawk to use the value of a variable as the name of the function to call.

8. Interval expressions are now part of default regular expressions for
   GNU Awk syntax.

9. --gen-po is now correctly named --gen-pot.

10. switch / case is now enabled by default. There's no longer a need
    for a configure-time option.

    --Control flow in the switch statement works as it does in C.

  1. seq 10 |awk '{switch ($0%2){
  2. case "0":
  3. print "even number: "$0;break
  4. default:
  5. print "odd number: "$0
  6. }
  7. }'
  8. odd number: 1
  9. even number: 2
  10. odd number: 3
  11. even number: 4
  12. odd number: 5
  13. even number: 6
  14. odd number: 7
  15. even number: 8
  16. odd number: 9
  17. even number: 10

 

11. Gawk now supports BEGINFILE and ENDFILE. See the doc for details.

--The body of the BEGINFILE rules is executed just before gawk reads the first record from a file. FILENAME is set to the name of the current file, and FNR is set to zero.
--The ENDFILE rule is called when gawk has finished processing the last record in an input file. For the last input file, it will be called before any END rules. (这两个功能真的很酷,尤其是在处理多个文件时,如下面:)

  1. head f1 f2
  2. ==> f1 <==
  3. aaa
  4. bbb
  5. ccc
  6. ==> f2 <==
  7. aaa
  8. bbb
  9. ccc
  10. awk 'BEGIN{print"BEGIN: ---"}BEGINFILE{print "\nBEGINFILE: +++"}{print}ENDFILE{print"ENDFILE: +++\n"}END{print"END: ---"}' f1 f2
  11. BEGIN: ---
  12. BEGINFILE: +++
  13. aaa
  14. bbb
  15. ccc
  16. ENDFILE: +++
  17. BEGINFILE: +++
  18. aaa
  19. bbb
  20. ccc
  21. ENDFILE: +++
  22. END: ---

 

12. Directories named on the command line now produce a warning, not
    a fatal error, unless --posix or --traditional.

13. The new FPAT variable allows you to specify a regexp that matches
    the fields, instead of matching the field separator. The new patsplit()
    function gives the same capability for splitting.

    --The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field.
  1. echo '111-222|33' |awk -vFS="[-|]" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
  2. $1 = 111
  3. $2 = 222
  4. $3 = 33
  5. #如果用FPAT呢?
  6. echo '111-222|33' |awk -vFPAT="[^-|]+" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
  7. $1 = 111
  8. $2 = 222
  9. $3 = 33
14. All long options now have short options, for use in `#!' scripts.

15. Support for IPv6 added via /inet6/... special file. /inet4/... forces
    IPv4 and /inet chooses the system default (probably IPv4).

16. Added a warning for /[:space:]/ that should be /[[:space:]]/.

17. Merged with John Haque's byte code internals. Adds dgawk debugger and
    possibly improved performance.

18. `break' and `continue' are no longer valid outside a loop, even with
    --traditional.

19. POSIX character classes work with --traditional (BWK awk supports them).

20. Nuked redundant --compat, --copyleft, and --usage long options.

21. Arrays of arrays added. See the doc. (这个更强!)
  1. awk 'BEGIN{arr["a"]["b"]=1;arr["a"]["c"]=2;
  2. for( i in arr)
  3. for( j in arr[i])
  4. print i,j,arr[i][j]
  5. }'
  6. a b 1
  7. a c 2
22. Per the GNU Coding Standards, dynamic extensions must now define
    a global symbol indicating that they are GPL-compatible. See
    the documentation and example extensions.
    THIS CHANGES BEHAVIOR!!!!

23. In POSIX mode, string comparisons use strcoll/wcscoll.
    THIS CHANGES BEHAVIOR!!!!

24. The option for raw sockets was removed, since it was never implemented.

25. If not in POSIX mode, gawk turns ranges of the form [d-h] into
    [defgh] before compiling a regexp.  Maybe this will stop all the
    questions about [a-z] matching uppercase letters.
    THIS CHANGES BEHAVIOR!!!!

26. PROCINFO["strftime"] now holds the default format for strftime().

27. Updated to latest infrastructure: Autoconf 2.68, Automake 1.11.1,
    Gettext 0.18.1, Bison 2.5.

28. Many code cleanups. Removed code for many old, unsupported systems:
        - Atari
        - Amiga
        - BeOS
        - Cray
        - MIPS RiscOS
        - MS-DOS with Microsoft Compiler
        - MS-Windows with Microsoft Compiler
        - NeXT
        - SunOS 3.x, Sun 386 (Road Runner)
        - Tandem (non-POSIX)
        - Prestandard VAX C compiler for VAX/VMS
        - Probably others that I've forgotten

29. If PROCINFO["sorted_in"] exists, for(iggy in foo) loops sort the
    indices before looping over them.  The value of this element
    provides control over how the indices are sorted before the loop
    traversal starts. See the manual.

30. A new isarray() function exists to distinguish if an item is an array
    or not, to make it possible to traverse multidimensional arrays.

31. asort() and asorti() take a third argument specifying how to sort.
    See the doc.
--

阅读(2998) | 评论(1) | 转发(0) |
0

上一篇:awk - sed1line

下一篇:wget高级用法

给主人留下些什么吧!~~

longber2011-07-18 00:18:00

前几天才注意到有mawk、gawk两种..