抛砖引玉----翻译加注sed1line-whan-ChinaUnix博客

whanwhan.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

whan

博客访问： 7947978
博文数量： 124
博客积分： 2880
博客等级：少校
技术积分： 873
用户组：普通用户
注册时间： 2009-09-16 17:08

文章分类

全部博文（124）

GIS（1）
网站收藏（2）
XML（2）
海阔天空（1）
操作系统（15）

Redhat（1）

UnixWare（2）

Shell（3）

Sco Openserver（1）

Linux（8）
编程（26）

GIT（0）

maven（1）

java（7）

SVN（2）

多线程（1）

Vim（2）

db2数据库开发（2）

动态语言（0）

C语言（10）
计算机基础（1）
硬件（1）
汽车点滴（1）
VMWare（2）
最短路径算法（6）
数据库（41）

数据库设计工具（13）

MySQL（1）

SQLServer（2）

oracle（5）

DB2（20）
.net（24）
未分配的博文（1）

文章存档

2011年（28）

2010年（60）

2009年（36）

我的朋友

最近访客

推荐博文

抛砖引玉----翻译加注sed1line

分类：

2010-08-10 17:10:40

抛砖引玉----翻译加注sed1line

翻译了一下  unix.net/forum/24/20040514/325187.html
   我是新手，翻译得不好，加注得马马虎虎，很多地方都是凭自己的理解写的。由于刚开始学sed，所以很多地方写得很初级，呵呵，难免有些罗嗦。写到最后又有些头晕，还请大虾们多多指点，里头好几个命令我解释不清楚，如不吝赐教，感激不尽！ :em16:

   同时欢迎拍砖！你拍一，我拍一，....... :em02:

FILE SPACING:

# double space a file
#使一个文件中每一行都占用两行空间(就是在每一行后边插一行空行)
sed G
###sed 中G命令的解释为append hold space to pattern space.
###就是在当前位置后加一行保留空间中的内容，无任何动作时，保留空间为空行
###所以就double space a file 了，呵呵．

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
#假若文件已经含有空白行在其中，使一个文件中每一行占用两行
#空间。输出文件中文本行之间不含有超过一行的空行
sed '/^$/d;G'
###先用sed  '/^$/d'  查找并删除空行；然后用 sed G插入空行

# triple space a file
#使一个文件中每一行都占用三行空间(就是在每一行后边插两行空行)
sed 'G;G'
###不用说了吧，重复两次sed G.

# undo double-spacing (assumes even-numbered lines are always blank)
#撤销占用两行空间的操作(假设偶数行都是空白行)
sed 'n;d'
###sed 中命令n的解释为Read　the next line of input into the pattern space．
###所以我理解为用sed n 读入下一行兵紧接着用sed d 删除，如果隔两行删除一行那么
###用sed 'n,n,d',要是隔100行删除一行呢....什么???!!!你要写100个n???!!!

# insert a blank line above every line which matches "regex"
#在每个含有字符串regex的行上插入一行空白行
sed '/regex/{x;p;x;}'
###sed 中命令x解释为Exchange the contents of the hold and pattern spaces.
###我的理解也就是交换保留空间与模式空间的内容
###sed 中命令p为Print the current pattern space．就是打印模式空间中的内容．
###所以理解为保留空间中开始为空行，模式空间中经过sed  '/regex/'查询后为包含
###regex内容的那一行，1)x;交换模式空间和保留空间的内容，此时模式空间中内容
###为空行，保留空间中内容为含有regex内容的行， 2)p；命令打印模式空间内容(
###空行)，在原文中含有regex内容的那一行的位置出现两行空行，其中后一行为
###模式空间中的内容，3)x;后交换模式空间和保留空间中的内容，．．．．结果就是在原
###来出现regex的位置前一行加入了一行空行。

# insert a blank line below every line which matches "regex"
# 在每个含有字符串regex的行下插入一行空白行
sed '/regex/G'
###比较简单，查找后在后边加入一空行

# insert a blank line above and below every line which matches "regex"
#在每个含有字符串regex的行上，下各插入一行空白行
sed '/regex/{x;p;x;G;}'
###兄弟两个sed '/regex/G'和sed '/regex/{x;p;x;}'合作的结果．

NUMBERING:

# number each line of a file (simple left alignment) Using a tab (see
# note on '\t' at end of file)instead of space will preserve margins.
#给文件每一行加上数字序号。用TAB制表符替换空间来保留空白(?)
#(在数字序号和文本中间加一TAB制表符)
sed = filename | sed 'N;s/\n/\t/'
###sed = filename的功能是 Print the current line number.
###但是这个功能是在每一行前面另加一行，并且显示行号,而不是直接在行首加序号
###sed中命令N的解释为Append the next line of input into the pattern space.
###也就是把当前行后一行的内容加在当前行后边．
###sed中命令s/regexp/replacement/解释为Attempt  to match regexp against the
###pattern space.  If successful,  replace  that  portion  matched  with
### replacement.大概意思是在模式空间外匹配regexp，如果成功，使用匹配replace
###ment的内容替换regexp.说白了就是查找替换吧．\n是换行符,\t是TAB制表符,
###所以整个命令的意思也就出来了．

# number each line of a file (nnumber on left, right-aligned)
#给文件每一行加上数字序号(数字在左边，向右对齐？)
sed = filename | sed 'N; s/^/ /; s/ *$.\{6,\}$\n/\1 /'
### 前面不用说了，但是后边......
###s/ *$.\{6,\}$\n/\1 /' 这个地方确实不是很明白!~~

# number each line of file, but only print numbers if line is not blank
#给文件每一行加上数字序号，但是仅当行非空时打印数字
sed '/./=' filename | sed '/./N; s/\n/ /'
###sed '/./=' filename的用处是查找除非空行赋予行号,sed '/./N; s/\n/ /'查找非
##空行并把后一行附加到当前行,然后用空格替换换行符\n

# count lines (emulates "wc -l")
#统计行数(类似于 "wc -l")
sed -n '$='
#sed中参数n的含义是suppress automatic printing of pattern space,也就是限制
###自动打印模式空间中内容的功能， '$='中$的含义是Match the last line，=前边
###已经说过了，就是打印行号，所以匹配最后一行而且只打印行号不打印内容，就是
###"wc -l"了

TEXT CONVERSION AND SUBSTITUTION:

# IN UNIX ENVIRONMENT: convert DOS newlines (cR/LF)to Unix format
#在UNIX环境下：转换DOS换行符(?)(cR/LF)UNIX格式

sed 's/.$//' # assumes that all lines end with CR/LF
               # 假设所有的行都以CR/LF结尾
###可能在DOS中的ASCII码(包括CR/LF)到了UNIX中都成了单字符吧，又因为".$"代表
###每行最后一个字符，所以把它们替换掉就OK了．CR/LF是啥？CR---ASCII Carriage
###Return(ASCII 回车) ,LF----ASCII Linefeed (ASCII换行)

sed 's/^M$//' # in bash/tcsh, press Ctrl-V then Ctrl-M
                  #在bash/tcsh中，按下Ctrl-V然后按 Ctrl-M
###没啥说的，就是查找替换，注意命令中"^M"在输入时一定是按下Ctrl-V然后按 Ctrl-M
###如果输入成ctrl+6键，然后输入一个大写M,什么替换也完成不了啦．

sed 's/\x0D$//' # gsed 3.02.80, but top script is easier
                     # ???
###不是很了解！高手说一下吧！

# IN UNIX ENVIRONMENT: convert Unix newlines (F)to DOS format
#在unix环境中：转换Unix换行符(F)DOS格式
sed "s/$/`echo -e \\\r`/" # command line under ksh
　　　　　　　　　　　　　#在ksh下的命令行
sed 's/$'"/`echo \\\r`/" # command line under bash
　　　　　　　　　　　　 #在bash下的命令行
sed "s/$/`echo \\\r`/" # command line under zsh
                     #在zsh下的命令行
sed 's/$/\r/' # gsed 3.02.80
　　　　　　　# gsed3.02.80版本下的命令行
###以上四个命令是在不同的shell版本下用\r(好象就是ASCII码下的CR)替换行尾

# IN DOS ENVIRONMENT: convert Unix newlines (F)to DOS format
#在DOS环境下转换Unix换行符到DOS格式
sed "s/$//" # method 1
sed -n p # method 2
###这句又不是很了解，本来$就是行尾了，把行尾替换成空，那就变成了DOS格式了吗？
###下边一句也很奇怪，参数-n是suppress automatic printing of pattern space，命
###令p是Print the current pattern space，一正一反就换成DOS格式了？乖乖~~

# IN DOS ENVIRONMENT: convert DOS newlines (cR/LF)to Unix format
#在Dos环境下：转换DOS换行符为UNIX格式
# Cannot be done with DOS versions of sed. Use "tr" instead.
#用DOS版本的sed不能做到这点，用"tr"代替．
tr -d \r outfile # GNU tr version 1.22 or higher
　　　　　　　　　　　　　#GNU tr 1.22版本或者更高版本

# delete leading whitespace (spaces, tabs)from front of each line
# aligns all text flush left
#删除每一行开头的空白(空格，TAB)左对齐排列全文．
sed 's/^[ \t]*//' # see note on '\t' at end of file
　　　　　　　　　# ???
### 又是替换成空，^[ \t]* 的含义为以空格或者TAB键开始的(或者是他们的组合)行．

# delete trailing whitespace (spaces, tabs)from end of each line
#从每一行结尾处删除最后的空格(空格,TAB)
sed 's/[ \t]*$//' # see note on '\t' at end of file
               #??
### 跟上边的命令"前呼后拥"呀．

# delete BOTH leading and trailing whitespace from each line
#删除每一行的开头和结尾的空格
sed 's/^[ \t]*//;s/[ \t]*$//'
###两步走．

# insert 5 blank spaces at beginning of each line (ake page offset)
#在每一行开始处插入5个空格(整页偏移)
sed 's/^/ /'
###没啥说的．

# align all text flush right on a 79-column width
#右对齐，按79列宽排列所有文本
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # set at 78 plus 1 space
###这个语句好像很麻烦，不过看懂了还挺有意思．：）
###首先出现了几个新东东1.":"  2."&". 3. "-e " 4."t"，解释一下
###1.":"  Label for b and t commands.(给b和t命令加注标签)
###2."&"　表示重复整个匹配的规则表达式．
###3. "-e" add the script to the commands to be executed
### 把脚本加到命令里用以执行
###4. t label If  a  s///  has  done  a successful substitution since the last
###input line was read and since the last  t  or  T  command,  then branch to
###label; if label is omitted, branch to end of script.
###如果从读入最后一个输入行并且执行最后一个t或者T命令后，一个s///命令成功替换，
###那么流程分支指向label处，如果label被忽略(就是没有成功替换吧，我想),那么流程
###分支指向脚本结束．
###回过头来看，整个sed命令行是一个循环执行的语句，每一行都要替换(78-当前行的字
###符数)次,所以如果整个文件比较大，或者一行字符数比较少，做这个动作就有点吃力了．
###不信你试试吧，呵呵．

# center all text in the middle of 79-column width. In method 1,
# spaces at the beginning of the line are significant, and trailing
# spaces are appended at the end of the line. In method 2, spaces at
# the beginning of the line are discarded in centering the line, and
# no trailing spaces appear at the end of lines.
#使所有文本居于79列宽的格式中央。在第一种方法中，每一行开头处的空格是
#很重要的，最后的空格被附在行尾。第二种方法中，一行开头的空格在中心对
#齐的行中被丢弃，行尾也没有原来结尾处的空格。
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # method 1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/$*$1/\1/' # method 2
###跟上边的差不多,当两边都加空格的时候，效率要高一些~~

# substitute (ind and replace)"foo" with "bar" on each line
#在每一行中用"bar"替换(找并替换)foo"
sed 's/foo/bar/' # replaces only 1st instance in a line
               # 在一行中，仅在第一次出现时替换
sed 's/foo/bar/4' # replaces only 4th instance in a line
               #在一行中，仅在第四次出现时替换
sed 's/foo/bar/g' # replaces ALL instances in a line
               #在一行中替换所有出现的值
###这三个命令很简单,不多说了.

sed 's/$.*$foo$.*foo$/\1bar\2/'  # replace the next-to-last case
                              #替换紧邻最后一个匹配出现的值
###'s///---- The replacement may contain the special character & to refer to that
###portion of the pattern space  which  matched,  and  the  special escapes  \1
### through  \9  to refer to the corresponding matching sub-expressions in the regexp.
###就不直接翻译了，大概意思就是replacement处可以包含&代表匹配的模式空间中
###的部分,特殊的\1-\9可以代表regexp中相应的"子表达式",也就是说前面regexp
###可以分为几个子表达式,而后边replacement中可以用\1-\9分别代表它们.这样就
###增加了灵活性，便于修改sed命令.
###把regexp中的\去掉后，就变成(.*)foo(*foo),其中(.*)表示零个或者多个字符，
###这样加上后边的\1bar\2就变成了改变倒数第二个foo,而倒数第一个不变

sed 's/$*$foo/\1bar/' # replace only the last case
                     #只替换最后一个值
###比上一个简单

# substitute "foo" with "bar" ONLY for lines which contain "baz"
#在每一含有"baz"的行中用"bar"替换(查找并替换)foo"
sed '/baz/s/foo/bar/g'
### /baz/用来查找，后边的用来替换

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
#在每一不含有"baz"的行中用"bar"替换(找并替换)foo"
sed '/baz/!s/foo/bar/g'
###反其道而行之．

# change "scarlet" or "ruby" or "puce" to "red"
#将"scarlet"或者"ruby"或者"puce"替换成"red"
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' # most seds
                                             #大多数sed可用
###三步走．
gsed 's/scarlet\|ruby\|puce/red/g' # GNU sed only
                                 #仅GNU sed可用

# reverse order of lines (emulates "tac")
#反转文章行的顺序(类似"tac" )
# bug/feature in HHsed v1.5 causes blank lines to be deleted
#???????
sed '1!G;h;$!d' # method 1
###
###首先看第一个命令1!G，这个是啥意思?"!"表示后边的命令对所有没有
###被选定的行发生作用，G呢？获得保留空间(专业名词叫内存缓冲区?)中
###的内容，并追加到当前模式空间的后面.1就是选定第一行.h的含义是拷贝
###模式空间内容到保留空间(内存缓冲区)。那么先看 sed '1!G'什么含义
###执行一下这个命令，假若文件是
### $ cat  test.txt
###  1
###  2
###  3
###  4
###那么 sed '1!G' test.txt的结果是
### $ sed '1!G' test.txt
###  1
###  2
###
###  3
###
###  4
###
###  $
### 也就是说除了第一行,后边每行都加了空行,这是因为内存缓冲区中默认值
###是空行吧。然后我们加上h,看看发生什么
### $ sed '1!G;h' test.txt
###  1
###  2
###  1
###  3
###  2
###  1
###  4
###  3
###  2
###  1
###  $
### 空行没了，咋回事?我是这样理解的，不知道对不对，大家帮助看看：）
###首先要确定，执行到每一行，sed把当前处理的行存在临时的缓冲区内，
###称为模式空间(pattern space).一旦sed完成对模式空间中行的处理，模式
###空间中的行就被送往屏幕．行被处理完成后，就被移出模式空间．．．
###
###命令执行第一行时，由于匹配了第一行，所以"!G"不起作用，只打印了
###第一行的内容，然后"h"把模版块中的内容也就是第一行的内容拷贝进缓冲区，
###注意此时是用第一行的内容替换空行.模式空间中的内容要打印，所以出现1.
###执行到第二行时，打印第二行内容，而且由于不匹配"1",所以在后边"G"命令起
###作用,获得了缓冲区中的第一行的内容，然后加到当前模式空间中，并打印,出现
###21。然后把模式空间中的内容写入缓冲区，也就是把21写入缓冲区。执行到第三行
###匹配不成功,所以缓冲区的内容应该是第二行的内容加上第一行的内容，追加到模
###式空间的第三行后边：321.然后把321拷贝到缓冲区，．．．以此类推就出现了上
###面的结果.
###我不知道这么解释对不对，但是当我把命令中的1换成2，3，4后执行，得到了我
###想象的结果。还请高手指点~~
###加上最后一句"$!d",那就是前面三行的结果删除，保留最后一行。这样就形成了
### tac的效果啦。

sed -n '1!G;h;$p' # method 2
###与上边类似的，不罗嗦!

# reverse each character on the line (emulates "rev")
#反转一行中每个字符的顺序(类似"rev")
sed '/\n/!G;s/$.$$.*\n$/&\2\1/;//D;s/.//'
###这个命令真是.....
###我是在解释不通,所以按照我的想法来说一下吧,完全是瞎说!
###'/\n/!G'是判断本行是否有换行符,如果没有执行G命令
###'s/$.$$.*\n$/&\2\1/'命令是在原来行+第二个字符(或者没有)开始到换行符+第一个字符
###//D命令是在模式空间删除第一行,注意执行完成后如果模式空间不为空，继续下一个
###循环执行.
###s/.//命令是删除第一个字符
###假设一行文字是 123\n
###那么执行后模式空间中的内容应该按下边的顺序变化
###  123\n
###  123\n23\n1
###  23\n1
###  23\n13\n21
###  13\n21
###  3\n21
###  3\n21\n321
###  \n321
###  321
### 我的疑问就是,为什么第一次没有执行s/.//?!如果执行了,那么就得不到结果了啊!
### 救~~~~命~~~啊！????????????????????????????????

QUOTE:

原帖由 "waker" 发表：
#反转一行中每个字符的顺序(类似"rev")
sed '/\n/!G;s/$.$$.*\n$/&\2\1/;//D;s/.//'
###假设一行文字是 123
###那么执行后模式空间中的内容应该按下边的顺序变化
执行/\n/!G;得
123\n
然后s/$.$$.*\n$/&\2\1/;
得
123\n23\n1
执行//D
23\n1
因为是D命令所以从头循环
模式空间有\n
所以/\n/!G;中G不执行
再来s...
23\n3\n21
再D
3\n21
循环,G不执行
再来s...
3\n\n321
再D
\n321
循环
G和s和D都不执行
执行最后的s/.//
321

# join pairs of lines side-by-side (like "paste")
#把两行合为一行(类似于"paste")
sed '$!N;s/\n/ /'
###这个命令改成 sed 'N;s/\n/ /'一样可以达到目的，不知前面
###的$!有什么用处...

# if a line ends with a backslash, append the next line to it
#如果一行以"\"结束，把下一行加在此行上
sed -e :a -e '/\\$/N; s/\\\n//; ta'
###循环操作，两次替换。

# if a line begins with an equal sign, append it to the previous line
# and replace the "=" with a single space
#如果一等号开始某一行，把这一行加到前一行后边，并且用一个空格替换等号
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'
###和上边差不多，要注意两个新的命令:
### P命令--Print up to the first embedded newline of  the  current
###pattern  space.打印当前模式空间中第一行。
###D命令--Delete up to the first embedded newline in
### the  pattern  space. Start  next  cycle,  but skip reading from
###the input if there is still data in the pattern space.
###删除当前模式空间中第一行。开始新的循环，但是如果在模式空间中仍然
###有数据，那么跳过读取输入。

# add commas to numeric strings, changing "1234567" to "1,234,567"
#给数字串加逗号，把"1234567"变为"1,234,567"
gsed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/$.*[0-9]$$[0-9]\{3\}$/\1,\2/;ta'  # other seds
###(.*[0-9])表示零个或者多个字符(可能包含数字)+一个数字,而
###([0-9]{3})表示三个数字,然后不停的替换,直到条件不成立,也就是没有
###四个以上连续的数字出现就停止了.

# add commas to numbers with decimal points and minus signs (NU sed)
#给带有小数点和负号的数字的数字加上逗号
gsed ':a;s/$^\|[^0-9.]$$[0-9]\+$$[0-9]\{3\}$/\1\2,\3/g;ta'
###没有gsed，不解释了

# add a blank line every 5 lines (after lines 5, 10, 15, 20, etc.)
#每五行后加一空行
gsed '0~5G' # GNU sed only
sed 'n;n;n;n;G;' # other seds
###一大早就说过了的...

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
#打印一个文件的前10行(模仿动作"head")
sed 10q

# print first line of file (emulates "head -1")
#打印一个文件的第一行(仿"head -1")
sed q
### q命令的解释Immediately quit the sed  script  without  processing
###any  more input,  except  that  if  auto-print is not disabled the
###current pattern space will be printed.
### 所以上边两个命令都清楚了，执行到第10行退出就打印前10行，执行第一行
###就退出就打印第一行

# print the last 10 lines of a file (emulates "tail")
#打印一个文件的后10行(仿"tail")
sed -e :a -e '$q;N;11,$D;ba'
###Label b : Branch to label; if label is omitted, branch to end of script.
###命令D 删除模式空间内第一个 newline 字母 \n 前的资料。
###命令N 把输入的下一行添加到模式空间中。
### b label:分支到脚本中带有标号的地方，如果标号不存就分支到脚本的末尾
###

QUOTE:

原帖由 "waker" 发表：
试着注一下，不知道对不对

如果我们只看sed -e :a -e '$q;N;ba'
这个循环不停的读入下一行直到结尾，这样整个文本就形成一个由\n分割的链

现在加上11,$D
sed -e :a -e '$q;N;11,$D;ba'
如果文本不超过10行
模式空间将保留整个文本打印出来
如果文本大于10行
从第11行开始，在下一行加入到链中后，模式空间第一个由\n分割的记录将被删除，这样看起来就是链头被链尾挤出整个链，总是保持10个链环，循环结束后，链中保存的就是文件的后10行,最后印出结果

:em16:

# print the last 2 lines of a file (emulates "tail -2")
#打印一个文件的最后两行(仿"tail -2")
sed '$!N;$!D'
### 开始看不太懂，抄了CU精华一段
###sed '$!N;$!D' : 对文件倒数第二行以前的行来说，N 将当前行的下一行放到模
###式空间中以后，D 就将模式空间的内容删除了；到倒数第二行的时候，将最后一行
###附加到倒数第二行下面，然后最后一行不执行 D ，所以文件的最后两行都保存下来了。
###不知道是这段话说得有些含糊，还是我理解得有偏差，总觉得D命令解释成
###"将模式空间的内容删除了"有些让人糊涂.
###而我是这样理解的，不知道对不对.首先说D命令是 Delete up to the first
###embedded newline in  the  pattern  space.也就是说D命令是删除模式空间中
###第一个换行符之前的内容，也就是删除第一行.然后D命令的解释还有一句,我认为
###这句很重要: Start  next  cycle,  but skip reading from the input if there
### is still data in the pattern space.开始下一个循环，但是如果模式空间中有
###数据，则跳过从输入中读取数据.
###具体怎么工作呢? 假设文件为
### $ cat test.txt
### 1
### 2
### 3
### 4
### 5
### 那么当执行第一行时,$!N把第二行加入到模式空间中第一行后边,然后$!D把第一行
###内容删除，模式空间中只剩下第二行的内容.注意,此时由于D命令开始下一个循环，
###所以不打印模式空间中的内容! (这个地方也是我想了半天才这么解释的，我也知道
###很可能不对，欢迎拍砖，呵呵)由于D命令要求模式空间中有数据的话就跳过读取下一行，
###所以继续下一个循环又到了$!N，此时读入第三行加到第二行后边，....以此类推。
###执行到读入第5行附加在第四行后边，然后由于$!D得不到执行，所以第4行和第5行
###都被保留，命令结束，打印模式空间...

# print the last line of a file (emulates "tail -1")
#打印一个文件的最后一行(仿"tail -1")
sed '$!d' # method 1
sed -n '$p' # method 2
###哈哈，终于看懂了一个，你也看懂了吧　：）

# print only lines which match regular expression (emulates "grep")
#只打印匹配的一定字符的行(仿"grep")
sed -n '/regexp/p' # method 1
sed '/regexp/!d' # method 2
###明白参数-n和命令p和d就明白这两个命令．

# print only lines which do NOT match regexp (emulates "grep -v")
#只打印于一定字符不匹配的行(效"grep -v")
sed -n '/regexp/!p' # method 1, corresponds to above
sed '/regexp/d' # method 2, simpler syntax
###和上边相反，正如注释所说．

# print the line immediately before a regexp, but not the line
# containing the regexp
#打印包含"regexp"那一行的上一行,但是不打印包含"regexp"的行.
sed -n '/regexp/{g;1!p;};h'
###在命令执行到包含"regexp"那一行的上一行时,模式空间中这行的内容被
###拷贝到保留空间中．执行到包含"regexp"那一行时就打印它了.

# print the line immediately after a regexp, but not the line
# containing the regexp
#打印在"regexp"之后紧跟那一行，但是除去包含"regexp"的行.
sed -n '/regexp/{n;p;}'
###与上边类似，比上边简单．

# print 1 line of context before and after regexp, with line number
# indicating where the regexp occurred (imilar to "grep -A1 -B1")
#在"regexp"前后打印一行上下文，使其行号指示"regexp"在哪里出现(
#grep -A1 -B1相似)
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
###看上去好像挺复杂，其实倒是不难解释．
###假设文档是这样
###$ cat test.txt
###  1 abc
###  2 cde
###  3 regexp
###  4 fgh
###  5 xyz
###命令执行到regexp前一行，引号里边的命令不执行,只执行h命令得到结果
###    command       parttern space          holdspace             output
### 执行到前一行             2cde             2cde
### 执行到regexp行 "="       3regexp                                     3
###    "x"                   2cde             3regexp
###    "1!p"                   2cde             3regexp                2cde
###    "g"                   3regexp          3regexp
###    "$N"             3regexp ; 4fgh       3regexp
###    "p"             3regexp ; 4fgh       3regexp                3regexp
###                                                                         4fgh
###    "D"                   4fgh             3regexp
###    "h"                   4fgh             4fgh
###
###  看一下最右边的输出结果，还不错吧！

# grep for AAA and BBB and CCC (n any order)
#查找"AAA"和"BBB"和"CCC".(任意顺序)
sed '/AAA/!d; /BBB/!d; /CCC/!d'

# grep for AAA and BBB and CCC (n that order)
# 查找"AAA"和"BBB"和"CCC".(一定顺序)
sed '/AAA.*BBB.*CCC/!d'

# grep for AAA or BBB or CCC (emulates "egrep")
#查找"AAA"或"BBB"或"CCC".(任意顺序)
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # most seds
gsed '/AAA\|BBB\|CCC/!d' # GNU sed only
###上边三个没什么说的，就是查找功能呗．

# print paragraph if it contains AAA (blank lines separate paragraphs)
# HHsed v1.5 must insert a 'G;' after 'x;' in the next 3 scripts below
#如果某段包含"AAA",则打印这一段。(空行用来分隔段落)
#HHsed v1.5必须在'x;'之后插入一个'G;'
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'
###前边一部分命令用保留空间来存储整个段落内容，后边一个命令用来查找

# print paragraph if it contains AAA and BBB and CCC (n any order)
#如果某段包含"AAA"和"BBB"和"CCC",则打印这一段
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'
###同上

# print paragraph if it contains AAA or BBB or CCC
# 如果某段包含"AAA"或"BBB"或"CCC",则打印这一段
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # GNU sed only
###同上

# print only lines of 65 characters or longer
#仅打印长于65个字符的行
sed -n '/^.\{65\}/p'
###这也没什么好说的，正则表达式的运用．

# print only lines of less than 65 characters
#仅打印少于65个字符的行
sed -n '/^.\{65\}/!p' # method 1, corresponds to above
sed '/^.\{65\}/d' # method 2, simpler syntax
###又没啥吧

# print section of file from regular expression to end of file
#打印从字符"regexp"开始到文件结束的部分
sed -n '/regexp/,$p'
###还没啥，注意","的作用是选择行的范围，从包含regexp的行到最后一行

# print section of file based on line numbers (ines 8-12, inclusive)
#根据行号来打印文件的一部分(-12行，包括在内)
sed -n '8,12p' # method 1
sed '8,12!d' # method 2

# print line number 52
#打印第52行
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files
###仅注意第三种方法效率比较高就行了

# beginning at line 3, print every 7th line
#从第三行开始，每7行打印一行
gsed -n '3~7p' # GNU sed only
sed -n '3,${p;n;n;n;n;n;n;}' # other seds
###好像很容易理解了吧

# print section of file between two regular expressions (nclusive)
#打印文件中指定字符之间的部分(含字符在内)
sed -n '/Iowa/,/Montana/p' # case sensitive
###现在简单了吧．：）

SELECTIVE DELETION OF CERTAIN LINES:

# print all of file EXCEPT section between 2 regular expressions
#打印除指定字符之间部分之外的全文
sed '/Iowa/,/Montana/d'
###与上边相似的简单

# delete duplicate, consecutive lines from a file (emulates "uniq")
# First line in a set of duplicate lines is kept, rest are deleted.
#删除文件中重复的连续的行(似于"uniq"命令)
#重复行中第一行保留，其他删除
sed '$!N; /^$.*$\n\1$/!P; D' 　
###如果不是最后一行，就把下一行附加在模式空间，然后进行查找操作
###"^"和"$"中间的内容如果有重复就匹配成功．如果匹配不成功就用P打印
###第一行．　然后删除第一行．

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
#删除文件中重复的，但不连续的行。注意不要溢出保留空间的缓冲器的大小，
#否则使用GNU sed.
sed -n 'G; s/\n/&&/; /^$[ -~]*\n$.*\n\1/d; s/\n//; h; P'
###在我的linux环境执行不了，出错是sed: -e expression #1, char 34:
###Invalid range end.是不是所谓的溢出保留空间的大小了呢？我也不得而知．
###大家补充吧．!!?????????????????

# delete the first 10 lines of a file
#删除一个文件中前10行
sed '1,10d'
# delete the last line of a file
#删除一个文件中最后1行
sed '$d'
###与上边一个都是查找删除

# delete the last 2 lines of a file
#删除一个文件中最后2行
sed 'N;$!P;$!D;$d'
###如果理解了sed '$!N;$!D'是如何工作的，这句话也不在话下吧！

# delete the last 10 lines of a file
#删除一个文件中后10行
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2
###和打印后10行相似．什么？打印后10那个没看懂? /shakehand  ：）
###?????????????????

# delete every 8th line
# 每8行删除1行
gsed '0~8d' # GNU sed only
sed 'n;n;n;n;n;n;n;d;' # other seds
###没说的!

# delete ALL blank lines from a file (ame as "grep '.' ")
#删除文件所有空白行(似于"grep '.' ")
sed '/^$/d' # method 1
sed '/./!d' # method 2
###这两句就是告诉我们1.无内容的删除,2.有内容的保留 : )

# delete all CONSECUTIVE blank lines from file except the first; also
# deletes all blank lines from top and end of file (emulates "cat -s")
#删除文件中除一行空白行之外的所有连续空白行，也同时删除所有从头到尾的所
#有空白行(似于"cat -s")
sed '/./,/^$/!d' # method 1, allows 0 blanks at top, 1 at EOF
               #方法1不允许文件顶部有空行，文件尾部可以
sed '/^$/N;/\n$/D' # method 2, allows 1 blank at top, 0 at EOF
               #方法2不允许文件尾部有空行，文件顶部可以
###两个先选择，后删除命令.不多说了.

# delete all CONSECUTIVE blank lines from file except the first 2:
#删除文件中连续空行中除前两行空白行之外的所有空白行
sed '/^$/N;/\n$/N;//D'
###跟上边的命令相似，多了一步而已.

# delete all leading blank lines at top of file
#删除文件开头部分中的所有空白行
sed '/./,$!d'
###从有字符开始的行直到最后一行保留，其他删除.

# delete all trailing blank lines at end of file
#删除文件结尾部分中的所有空白行
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # works on all seds
sed -e :a -e '/^\n*$/N;/\n$/ba' # ditto, except for gsed 3.02*
###不行了要死了，还是高手说吧，我再看下去会疯的！
###?????????????????????????????

# delete the last line of each paragraph
#删除每个段落中最后1行
sed -n '/^$/{p;h;};/./{x;/./p;}'
###应该是假设段落间用空行分隔
###命令执行时，如果不是空行那么交换模式空间和保留空间，如果交换后
###模式空间不为空，则打印模式空间中内容；如果是空行，那么打印模式空间
###间中的内容,也就是打印空行...以此类推,出现结果.

###终于完了，下边的特殊应用没有加注，随便翻译了一下，可能不够准确，大家参考一下吧. :em11:

SPECIAL APPLICATIONS:

# remove nroff overstrikes (char, backspace)from man pages. The 'echo'
# command may need an -e switch if you use Unix System V or bash shell.
# 从man page页里删除所有overstrikes(字符,backspace).如果使用unix系统v
#或者bash shell,echo命令可能需要-e参数.
sed "s/.`echo \\\b`//g" # double quotes required for Unix environment
                     #unix环境下需要双引号
sed 's/.^H//g' # in bash/tcsh, press Ctrl-V and then Ctrl-H
            #在bash/tcsh中，按Ctrl-V然后按Ctrl-H
sed 's/.\x08//g' # hex expression for sed v1.5
               #sed v1.5中的hex表达式

# get Usenet/e-mail message header
# 获得新闻组/e-mail信息的标题部分
sed '/^$/q' # deletes everything after first blank line

# get Usenet/e-mail message body
#获得新闻组/e-mail信息的主体部分
sed '1,/^$/d' # deletes everything up to first blank line

# get Subject header, but remove initial "Subject: " portion
#获得题目的标题，但是删去开始的"Subject: "部分
sed '/^Subject: */!d; s///;q'

# get return address header
#获得返回的地址标题()
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (ee preceding script)
#正确解析地址。把email地址从一行中单独提出来并返回地址头()
sed 's/ *(*)/; s/>.*//; s/.*[:<] *//'

# add a leading angle bracket and space to each line (uote a message)
#给每行增加的尖括号和空格()信息被引用)
#sed 's/^/> /'

# delete leading angle bracket & space from each line (nquote a message)
#删除每行的尖括号和空格()信息不被引用)
sed 's/^> //'

# remove most HTML tags (ccommodates multiple-line tags)
#删去大部分HTML标签(供多行标签))
sed -e :a -e 's/<[^>]*>//g;/
# extract multi-part un(?)encoded binaries, removing extraneous header
# info, so that only the uuencoded portion remains. Files passed to
# sed must be passed in the proper order. Version 1 can be entered
# from the command line; version 2 can be made into an executable
# Unix shell script. (odified from a script by Rahul Dhesi.)
#抽取多部分未编码的二进制字节,删除无关的头信息,使得只保留未编码的部分.
#文件传送给sed必须保持正确的顺序。第一版本可以用于命令行的执行，第二版本
#可以制作成一个可执行的unix shell脚本
sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode # vers. 1
sed '/^end/,/^begin/d' "$@" | uudecode # vers. 2

# zip up each .TXT file individually, deleting the source file and
# setting the name of each .ZIP file to the basename of the .TXT file
# (under DOS: the "dir /b" switch returns bare filenames in all caps)
#独立的压缩每个txt文件，删除原文件并且根绝原文本文件设置每个zip文件名。
echo @echo off >zipup.bat
dir /b *.txt | sed "s/^$*$.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

阅读(1320) | 评论(0) | 转发(0) |

上一篇：文本编辑的一点心得--sed

下一篇：DB2数据库UTF-8字符集的汉字占3个字节

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6