Chinaunix首页 | 论坛 | 博客
  • 博客访问: 262823
  • 博文数量: 54
  • 博客积分: 1761
  • 博客等级: 上尉
  • 技术积分: 585
  • 用 户 组: 普通用户
  • 注册时间: 2010-11-17 23:30
文章分类

全部博文(54)

文章存档

2013年(4)

2012年(7)

2011年(15)

2010年(28)

分类:

2010-11-21 23:36:22

sed, a stream editor


Next: , Up: 

sed, a stream editor

This file documents version 4.2.1 of : Introduction

  • : Invocation
  • : sed programs
  • : Some sample scripts
  • : Limitations and (non-)limitations of : Other resources for learning about sed
  • : Reporting bugs
  • : egrep-style regular expressions
  • : A menu with all the topics in this manual.
  • : A menu with all sed commands and command-line options.
  • --- The detailed node listing ---

    sed Programs:

    Examples:


    Next: , Previous: , Up: 

    1 Introduction

    sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors.


    Next: , Previous: , Up: 

    2 Invocation

    Normally sed is invoked like this:

         sed SCRIPT INPUTFILE...

    The full format for invoking sed is:

         sed OPTIONS... [SCRIPT] [INPUTFILE...]

    If you do not specify INPUTFILE, or if INPUTFILE is -, sed filters the contents of the standard input. The script is actually the first non-option parameter, which sed specially considers a script and not an input file if (and only if) none of the other options specifies a script to be executed, that is if neither of the -e and -f options is specified.

    sed may be invoked with the following command-line options:

    --version
    Print out the version of sed that is being run and a copyright notice, then exit.
    --help
    Print a usage message briefly summarizing these command-line options and the bug-reporting address, then exit.
    -n
    --quiet
    --silent
    By default, sed prints out the pattern space at the end of each cycle through the script (see How sed works). These options disable this automatic printing, and sed only produces output when explicitly told to via the p command.
    -e script
    --expression=script
    Add the commands in script to the set of commands to be run while processing the input.
    -f script-file
    --file=script-file
    Add the commands contained in the file script-file to the set of commands to be run while processing the input.
    -i[SUFFIX]
    --in-place[=SUFFIX]
    This option specifies that files are to be edited in-place. .

    This option implies -s.

    When the end of the file is reached, the temporary file is renamed to the output file's original name. The extension, if supplied, is used to modify the name of the old file before renaming the temporary file, thereby making a backup copy).

    This rule is followed: if the extension doesn't contain a *, then it is appended to the end of the current filename as a suffix; if the extension does contain one or more * characters, then each asterisk is replaced with the current filename. This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix, or even to place backup copies of the original files into another directory (provided the directory already exists).

    If no extension is supplied, the original file is overwritten without making a backup.

    -l N
    --line-length=N
    Specify the default line-wrap length for the l command. A length of 0 (zero) means to never wrap long lines. If not specified, it is taken to be 70.
    --posix
    GNU sed includes several extensions to POSIX sed. In order to simplify writing portable scripts, this option disables all the extensions that this manual documents, including additional commands. Most of the extensions accept sed programs that are outside the syntax mandated by ) actually violate the standard. If you want to disable only the latter kind of extension, you can set the POSIXLY_CORRECT variable to a non-empty value.
    -b
    --binary
    This option is available on every platform, but is only effective where the operating system makes a distinction between text files and binary files. When such a distinction is made—as is the case for MS-DOS, Windows, Cygwin—text files are composed of lines separated by a carriage return and a line feed character, and sed does not see the ending CR. When this option is specified, sed will open input files in binary mode, thus not requesting this special processing and considering lines to end at a line feed.
    --follow-symlinks
    This option is available only on platforms that support symbolic links and has an effect only if option -i is specified. In this case, if the file that is specified on the command line is a symbolic link, sed will follow the link and edit the ultimate destination of the link. The default behavior is to break the symbolic link, so that the link destination will not be modified.
    -r
    --regexp-extended
    Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have less backslashes, but are a .
    -s
    --separate
    By default, sed will consider the files specified on the command line as a single continuous long stream. This Buffer both input and output as minimally as practical. (This is particularly useful if the input is coming from the likes of ‘tail -f’, and you wish to see the transformed output as soon as possible.)

    If no -e, -f, --expression, or --file options are given on the command-line, then the first non-option argument on the command line is taken to be the script to be executed.

    If any command-line parameters remain after processing the above, these parameters are interpreted as the names of input files to be processed. A file name of ‘-’ refers to the standard input stream. The standard input will be processed if no file names are specified.


    Next: , Previous: , Up: 

    3 sed Programs

    A sed program consists of one or more sed commands, passed in by one or more of the -e, -f, --expression, and --file options, or the first non-option argument if zero of these options are used. This document will refer to “the” sed script; this is understood to mean the in-order catenation of all of the scripts and script-files passed in.

    Commands within a script or script-file can be separated by semicolons (;) or newlines (ASCII 10). Some commands, due to their syntax, cannot be followed by semicolons working as command separators and thus should be terminated with newlines or be placed at the end of a script or script-file. Commands can also be preceded with optional non-significant whitespace characters.

    Each sed command consists of an optional address or address range, followed by a one-character command name and any additional command-specific code.


    Next: , Up: 

    3.1 How sed Works

    sed maintains two data buffers: the active pattern space, and the auxiliary hold space. Both are initially empty.

    sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.

    When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.

    Unless special commands (like ‘D’) are used, the pattern space is deleted between two cycles. The hold space, on the other hand, keeps its data between cycles (see commands ‘h’, ‘H’, ‘x’, ‘g’, ‘G’ to move data between both buffers).


    Next: , Previous: Execution Cycle, Up: 

    3.2 Selecting lines with sed

    Addresses in a sed script can be in any of the following forms:

    number
    Specifying a line number will match only that line in the input. (Note that sed counts lines continuously across all input files unless -i or -s options are specified.)
    first~step
    This GNU extension matches every stepth line starting with line first. In particular, lines will be selected when there exists a non-negative n such that the current line-number equals first + (n * step). Thus, to select the odd-numbered lines, one would use 1~2; to pick every third line starting with the second, ‘2~3’ would be used; to pick every fifth line starting with the tenth, use ‘10~5’; and ‘50~0’ is just an obscure way of saying 50.
    $
    This address matches the last line of the last file of input, or the last line of each file when the -i or -s options are specified.
    /regexp/
    This will select any line which matches the regular expression regexp. If regexp itself includes any / characters, each must be escaped by a backslash (\).

    The empty regular expression ‘//’ repeats the last regular expression match (the same holds if the empty regular expression is passed to the s command). Note that modifiers to regular expressions are evaluated when the regular expression is compiled, thus it is invalid to specify them together with the empty regular expression.

    \%regexp%
    (The % may be replaced by any other single character.)

    This also matches the regular expression regexp, but allows one to use a different delimiter than /. This is particularly useful if the regexp itself contains a lot of slashes, since it avoids the tedious escaping of every /. If regexp itself includes any delimiter characters, each must be escaped by a backslash (\).

    /regexp/I
    \%regexp%I
    The I modifier to regular-expression matching is a The M modifier to regular-expression matching is a An address range can be specified by specifying two addresses separated by a comma (,). An address range matches lines starting from where the first address matches, and continues until the second address matches (inclusively).

    If the second address is a regexp, then checking for the ending match will start with the line following the line which matched the first address: a range will always span at least two lines (except of course if the input stream ends).

    If the second address is a number less than (or equal to) the line matching the first address, then only the one line is matched.

    GNU sed also supports some special two-address forms; all these are GNU extensions:

    0,/regexp/
    A line number of 0 can be used in an address specification like 0,/regexp/ so that sed will try to match regexp in the first input line too. In other words, 0,/regexp/ is similar to 1,/regexp/, except that if addr2 matches the very first line of input the 0,/regexp/ form will consider it to end the range, whereas the 1,/regexp/ form will match the beginning of its range and hence make the range span up to the second occurrence of the regular expression.

    Note that this is the only place where the 0 address makes sense; there is no 0-th line and commands which are given the 0 address in any other way will give an error.

    addr1,+N
    Matches addr1 and the N lines following addr1.
    addr1,~N
    Matches addr1 and the lines following addr1 until the next line whose input line number is a multiple of N.

    Appending the ! character to the end of an address specification negates the sense of the match. That is, if the ! character follows an address range, then only lines which do not match the address range will be selected. This also works for singleton addresses, and, perhaps perversely, for the null address.


    Next: , Previous: , Up: 

    3.3 Overview of Regular Expression Syntax

    To know how to use sed, people should understand regular expressions (regexp for short). A regular expression is a pattern that is matched against a subject string from left to right. Most characters are ordinary: they stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

         The quick brown fox

    matches a portion of a subject string that is identical to itself. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of special characters, which do not stand for themselves but instead are interpreted in some special way. Here is a brief description of regular expression syntax as used in sed.

    char
    A single ordinary character matches itself.
    *
    Matches a sequence of zero or more instances of matches for the preceding regular expression, which must be an ordinary character, a special character preceded by \, a ., a grouped regexp (see below), or a bracket expression. As a As *, but matches one or more. It is a As *, but only matches zero or one. It is a Apply postfix operators, like \(abcd\)*: this will search for zero or more whole sequences of ‘abcd’, while abcd* would search for ‘abc’ followed by zero or more occurrences of ‘d’. Note that support for \(abcd\)* is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
  • Use back references (see below).

  • .
    Matches any character, including newline.
    ^
    Matches the null string at beginning of the pattern space, i.e. what appears after the circumflex must appear at the beginning of the pattern space.

    In most scripts, pattern space is initialized to the content of each line (see How sed works). So, it is a useful simplification to think of ^#include as matching only lines where ‘#include’ is the first thing on line—if there are spaces before, for example, the match fails. This simplification is valid as long as the original content of pattern space is not modified, for example with an s command.

    ^ acts as a special character only at the beginning of the regular expression or subexpression (that is, after \( or \|). Portable scripts should avoid ^ at the beginning of a subexpression, though, as The characters $, *, ., [, and \ are normally not special within list. For example, [\*] matches either ‘\’ or ‘*’, because the \ is not special here. However, strings like [.ch.], [=a=], and [:space:] are special within list and represent collating symbols, equivalence classes, and character classes, respectively, and [ is therefore special within list when it is followed by ., =, or :. Also, when not in POSIXLY_CORRECT mode, special escapes like \n and \t are recognized within list. See .

    regexp1\|regexp2
    Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.
    regexp1regexp2
    Matches the concatenation of regexp1 and regexp2. Concatenation binds more tightly than \|, ^, and $, but less tightly than the other regular expression operators.
    \digit
    Matches the digit-th \(...\) parenthesized subexpression in the regular expression. This is called a back reference. Subexpressions are implicity numbered by counting occurrences of \( left-to-right.
    \n
    Matches the newline character.
    \char
    Matches char, where char is one of $, *, ., [, \, or ^. Note that the only C-like backslash sequences that you can portably assume to be interpreted are \n and \\; in particular \t is not portable, and matches a ‘t’ under most implementations of sed, rather than a tab character.

    Note that the regular expression matcher is greedy, i.e., matches are attempted from left to right and, if two or more matches are possible starting at the same character, it selects the longest.

    Examples:

    abcdef
    Matches ‘abcdef’.
    a*b
    Matches zero or more ‘a’s followed by a single ‘b’. For example, ‘b’ or ‘aaaaab’.
    a\?b
    Matches ‘b’ or ‘ab’.
    a\+b\+
    Matches one or more ‘a’s followed by one or more ‘b’s: ‘ab’ is the shortest possible match, but other examples are ‘aaaab’ or ‘abbbbb’ or ‘aaaaaabbbbbbb’.
    .*
    .\+
    These two both match all the characters in a string; however, the first matches every string (including the empty string), while the second matches only strings containing at least one character.
    ^main.*(.*)
    This matches a string starting with ‘main’, followed by an opening and closing parenthesis. The ‘n’, ‘(’ and ‘)’ need not be adjacent.
    ^#
    This matches a string beginning with ‘#’.
    \\$
    This matches a string ending with a single backslash. The regexp contains two backslashes for escaping.
    \$
    Instead, this matches a string consisting of a single dollar sign, because it is escaped.
    [a-zA-Z0-9]
    In the C locale, this matches any Next: , Previous: , Up: 

    3.4 Often-Used Commands

    If you use sed at all, you will quite likely want to know these commands.

    #
    [No addresses allowed.]

    The # character begins a comment; the comment continues until the next newline.

    If you are concerned about portability, be aware that some implementations of sed (which are not posix conformant) may only support a single one-line comment, and then only when the very first character of the script is a #.

    Warning: if the first two characters of the sed script are #n, then the -n (no-autoprint) option is forced. If you want to put a comment in the first line of your script and that comment begins with the letter ‘n’ and you do not want this behavior, then be sure to either use a capital ‘N’, or place at least one space before the ‘n’.

    q [exit-code]
    This command only accepts a single address.

    Exit sed without processing any more commands or input. Note that the current pattern space is printed if auto-print is not disabled with the -n options. The ability to return an exit code from the sed script is a Delete the pattern space; immediately start next cycle.

    p
    Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.
    n
    If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.
    { commands }
    A group of commands may be enclosed between { and } characters. This is particularly useful when you want a group of commands to be triggered by a single address (or address-range) match.

    Next: , Previous: , Up: 

    3.5 The s Command

    The syntax of the s (as in substitute) command is ‘s/regexp/replacement/flags’. The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.

    The s command is probably the most important in sed and has a lot of different options. Its basic concept is simple: the s command attempts to match the pattern space against the supplied regexp; if the match is successful, then that portion of the pattern space which was matched is replaced with replacement.

    The replacement can contain \n (n being a number from 1 to 9, inclusive) references, which refer to the portion of the match which is contained between the nth \( and its matching \). Also, the replacement can contain unescaped & characters which reference the whole matched portion of the pattern space. Finally, as a The s command can be followed by zero or more of the following flags:

    g
    Apply the replacement to all matches to the regexp, not just the first.
    number
    Only replace the numberth match of the regexp.

    Note: the posix standard does not specify what should happen when you mix the g and number modifiers, and currently there is no widely agreed upon meaning across sed implementations. For If the substitution was made, then print the new pattern space.

    Note: when both the p and e options are specified, the relative ordering of the two produces very different results. In general, ep (evaluate then print) is what you want, but operating the other way round can be useful for debugging. For this reason, the current version of GNU sed interprets specially the presence of p options both before and after e, printing the pattern space before and after evaluation, while in general flags for the s command show their effect just once. This behavior, although documented, might change in future versions.

    w file-name
    If the substitution was made, then write out the result to the named file. As a GNU sed extension, two special values of file-name are supported: /dev/stderr, which writes the result to the standard error, and /dev/stdout, which writes to the standard output.4
    e
    This command allows one to pipe input from a shell command into pattern space. If a substitution was made, the command that is found in pattern space is executed and pattern space is replaced with its output. A trailing newline is suppressed; results are undefined if the command to be executed contains a nul character. This is a The I modifier to regular-expression matching is a The M modifier to regular-expression matching is a Next: , Previous: , Up: 

    3.6 Less Frequently-Used Commands

    Though perhaps less frequently used than those in the previous section, some very small yet useful sed scripts can be built with these commands.

    y/source-chars/dest-chars/
    (The / characters may be uniformly replaced by any other single character within any given y command.)

    Transliterate any characters in the pattern space which match any of the source-chars with the corresponding character in dest-chars.

    Instances of the / (or whatever other character is used in its stead), \, or newlines can appear in the source-chars or dest-chars lists, provide that each instance is escaped by a \. The source-chars and dest-chars lists must contain the same number of characters (after de-escaping).

    a\
    text
    As a Queue the lines of text which follow this command (each but the last ending with a \, which are removed from the output) to be output at the end of the current cycle, or when the next input line is read.

    Escape sequences in text are processed, so you should use \\ in text to print a single backslash.

    As a As a Immediately output the lines of text which follow this command (each but the last ending with a \, which are removed from the output).

    c\
    text
    Delete the lines matching the address or address-range, and output the lines of text which follow this command (each but the last ending with a \, which are removed from the output) in place of the last line (or in place of each line, if no addresses were specified). A new cycle is started after this command is done, since the pattern space will have been deleted.
    =
    As a Print out the current input line number (with a trailing newline).

    l n
    Print the pattern space in an unambiguous form: non-printable characters (and the \ character) are printed in C-style escaped form; long lines are split, with a trailing \ character to indicate the split; the end of each line is marked with a $.

    n specifies the desired line-wrap length; a length of 0 (zero) means to never wrap long lines. If omitted, the default as specified on the command line is used. The n parameter is a As a Queue the contents of filename to be read and inserted into the output stream at the end of the current cycle, or when the next input line is read. Note that if filename cannot be read, it is treated as if it were an empty file, without any error indication.

    As a GNU sed extension, the special value /dev/stdin is supported for the file name, which reads the contents of the standard input.

    w filename
    Write the pattern space to filename. As a GNU sed extension, two special values of file-name are supported: /dev/stderr, which writes the result to the standard error, and /dev/stdout, which writes to the standard output.5

    The file will be created (or truncated) before the first input line is read; all w commands (including instances of the

    阅读(966) | 评论(0) | 转发(0) |
    0

    上一篇:Sed 手册

    下一篇:Python 宝贵链接

    给主人留下些什么吧!~~