Chinaunix首页 | 论坛 | 博客
  • 博客访问: 468341
  • 博文数量: 145
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 1060
  • 用 户 组: 普通用户
  • 注册时间: 2013-08-22 11:52
个人简介

专注计算机技术: Linux Android 云计算 虚拟化 网络

文章分类

全部博文(145)

文章存档

2016年(3)

2015年(21)

2014年(75)

2013年(46)

我的朋友

分类: LINUX

2013-09-26 23:54:06

Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports.

Sed is a "non-interactive" stream-oriented editor. Typical tasks for sed: 1. To automate editing actions to be performed on one or more files. 2. To simplify the task of performing the same edits on multiple files. 3. To write conversion programs.
The benefits of awk are best realized when the data has some kind of structure. Some of the things awk allows you to do are:
1. View a text file as a textual database made up of records and fields. 2.  Use variables to manipulate the database. 3.  Use arithmetic and string operators. 4. Use common programming constructs such as loops and conditionals.
5. Generate formatted reports. 6. Define functions. 7. Execute UNIX commands from a script. 8. Process the result of UNIX commands. 9. Process command-line arguments more gracefully. 10. Work more easily with multiple input streams.

Regular Expression:
... ...
The POSIX standard formalizes the meaning of regular expression characters and operators. The standard defines two classes of regular expressions: Basic Regular Expressions (BREs), which are the kind used by grep and sed, and Extended Regular Expressions, which are the kind used by egrep and awk.

POSIX also changed what had been common terminology. What we've been calling a "character class" is called a "bracket expression" in the POSIX standard. Within bracket expressions, beside literal characters such as a, !, and so on, you can have additional componen ts. These are:
    - Character classes. A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters, control characters, and so on.
    -  Collating symbols. A collating symbol is a multicharacter sequence that should be treated as a unit. It consists of the characters bracketed by [. and .]. 
    -  Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale, bracketed by [= and =].

All three of these constructs must appear inside the square brackets of a bracket expression. For example:
    - [[:alpha:]!] matches any single alphabetic character or the exclamation point
    -  [[.ch.]] matches the collating element ch, but does not match just the letter c or the letter h.
    - In a French locale, [[=e=]] might match any of e, è, or é.

awk - System Variables:

ARGC 命令行变元个数

ARGV 命令行变元数组

FILENAME 当前输入文件名

FNR 当前文件中的记录号

FS 输入域分隔符,默认为一个空格

RS 输入记录分隔符

NF 当前记录里域个数

NR 到目前为止记录数

OFS 输出域分隔符

ORS 输出记录分隔符



TBC ...
阅读(736) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~