Chinaunix首页 | 论坛 | 博客
  • 博客访问: 129702
  • 博文数量: 37
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 0
  • 用 户 组: 普通用户
  • 注册时间: 2016-07-02 01:04
个人简介

CU

文章分类

全部博文(37)

文章存档

2016年(33)

2015年(4)

我的朋友

分类: LINUX

2016-07-19 15:57:39

原文地址:awk之gensub()函数详解 作者:zooyo



gensub:

gensub(regexp, replacement, how [, target]) #
gensub is a general substitution function. Like sub and gsub, it searches the target string target for matches of the regular expression regexp. Unlike sub and gsub, the modified string is returned as the result of the function and the original target string is not changed. If how is a string beginning with ‘g’ or ‘G’, then it replaces all matches of regexp with replacement. Otherwise, how is treated as a number that indicates which match of regexp to replace. If no target is supplied, $0 is used.
gensub provides an additional feature that is not available in sub or gsub: the ability to specify components of a regexp in the replacement text. This is done by using parentheses in the regexp to mark the components and then specifying ‘\N’ in the replacement text, where N is a digit from 1 to 9. For example:

          $ gawk '
          > BEGIN {
          >      a = "abc def"
          >      b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
          >      print b
          > }'
          -| def abc

As with sub, you must type two backslashes in order to get one into the string. In the replacement text, the sequence ‘\0’ represents the entire matched text, as does the character ‘&’.

The following example shows how you can use the third argument to control which match of the regexp should be changed:

          $ echo a b c a b c |
          > gawk '{ print gensub(/a/, "AA", 2) }'
          -| a b c AA b c

In this case, $0 is used as the default target string. gensub returns the new string as its result, which is passed directly to print for printing.

If the how argument is a string that does not begin with ‘g’ or ‘G’, or if it is a number that is less than or equal to zero, only one substitution is performed. If how is zero, gawk issues a warning message.

If regexp does not match target, gensub's return value is the original unchanged value of target.

gensub is a gawk extension; it is not available in compatibility mode (see Options).
 
------------------------------------------------------------------------------------------------
 
    以上是引用之GNU官网指南的一段讲解。gensub确实不同于sub和gsub这2个工具,它有它更独特的魅力,下面我通过几个例子给大家讲解它的实际用法和独到之处。
 
 
  1. echo "11111" | awk 'BEGIN{FS=OFS=""}{$4="x";print}'
  2. 111x1
[解析]
  这个时候阁下是不是看这段没有字段分割符的文本已经感到无从下手了?呵呵,经过这一系列的FS、OFS设置我们终于达到了目的,是不是很麻烦?我们再看看下面的例子。
 
 
 
  1. echo "11111" | awk '{print $0=gensub("1","x",4)}'
  2. 111x1

  3. echo "11111" | awk '{print $0=gensub("1","x","g")}'
  4. xxxxx

[解析]

  阁下是不是豁然开朗了很多?这就是gensub的便捷,在这里面不再需要字段分割符,用数字即可指定你要替换的某一位。g和G是指全局替换,要双引号标记起来。特别注意的是gensub是不会修改原记录的,所以要对$0进行一个赋值。“the modified string is returned as the result of the function and the original target string is not changed”。接下来我们再看看一个更复杂的运用。

 

  1. echo "unix linux" | awk '{print gensub(/(.+) (.+)/,"\\2 \\1","g")}'
  2. linux unix

  3. echo "xaax xbx xxx:xaax xbx xxx" | awk -F: -vOFS=":" '{$2=gensub(/x([^x]+)x/,"\\1YY",2,$2)}1'
  4. xaax xbx xxx:xaax bYY xxx

[解析]

  是不是觉得很眼熟?呵呵是的,就是sed的用法,对!还有正则。特别注意的是需要双斜杠噢。聪明的您以后一定会熟练gensub的用法的。

阅读(2148) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~