欢迎加入IT云增值在线QQ交流群:342584734
分类:
2006-03-13 22:04:27
Grep this |
The grep utility, which allows files to be searched for strings of words, uses a syntax similar to the regular expression syntax of the vi, ex, ed, and sed editors. grep comes in three flavors, grep, fgrep, and egrep, all of which I'll cover in this article.
The name grep is derived from the editor command g/re/p, which literally translates to "globally search for a regular wxpression and print what you find." Regular expressions are at the core of grep, and I'll cover them after a brief description of some of the utility's command options.
The simplest grep command is grep (search pattern) (files list), as in:
$ grep hello * story.txt: so I said hello and she smiled back intro.txt: use the hello.c program as an example of C programming $
$ grep -i hello * story.txt: so I said hello and she smiled back story.txt: I could hear my echo, "HELLO." intro.txt: use the hello.c program as an example of C programming hello.c: printf("Hello, world. \n"); $
The output of grep varies depending on whether you're searching one or several files. If only one file is named on the command line, the output doesn't include the file name, as in the following example:
$ grep -i hello hello.c printf("Hello, world. \n"); $
$ grep -i hello *.c printf("Hello, world. \n"); $
$ grep -il hello *.c hello.c: $
$ grep -il hello * hello.c: intro.txt: story.txt: $
$ grep -in hello * hello.c:7: printf("Hello, world. \n"); intro.txt:44: use the hello.c program as an example of C programming story.txt:110: so I said hello and she smiled back story.txt:187: I could hear my echo, "HELLO." $
$ grep -iv hello intro.txt You will be able to get more practice if you at its simplest $
$ grep -ic hello * data.txt:0 hello.c:1: intro.txt:1 intro2.txt:0 story.txt:2 $
Going wild with grep |
I will begin with some of the simpler characters in a regular expression. A ^ (caret) character means the start of a line and a $ (dollar) character means the end of one.
The wild cards used by grep frequently clash with the special symbols that the shell uses, so the usual practice is to enclose complex search strings within single quotes. The two following examples would match any case version of "hello" at the start and end of a line, respectively.
$ grep '^hello' * $ grep 'hello$' *
$ grep '.ello' *
$ grep '[hcj]ello' *
$ grep '[b-d]ay' *
$ grep '[^b-d]ay' *
Any single character match (including a single character matched by a option/range specification) can be repeated by using the asterisk character (*). An asterisk following a single character means "zero or more occurrences" of the preceding match. The following search requests any line containing "hello" followed by "dolly" where the words are separated by zero or more spaces. Note that the asterisk follows the space after "hello" and therefore applies to the space character.
$ grep 'hello *dolly' *
hellodolly hello dolly hello dolly
$ grep 'c[aeiou]*t' somewords.txt cat coat coot cot cout cut ct $
fgrep |
At this point you might be wondering how fgrep fits in with the others. fgrep is essentially grep (or egrep) with no special characters. If you want to search for a simple string without wild cards, use fgrep. The fgrep version of grep is optimized to search for strings as they appear on the command line, so it doesn't treat any characters as special. You could use fgrep in the above examples to more efficiently search for the plain string "hello," and also to search for strings that contain special characters used in their usual sense. For example, if you wanted to search for "hello" at the end of a sentence, you would want to search for "hello." (hello followed by a period). The dot or period is a special character in grep or egrep, but fgrep simply treats a period as a period and not as a special character.
$ fgrep 'hello.' *
I have two final notes about searching for multiple strings. Multiple search patterns can be placed on a single command line by using the -e option. The following example will search for "cat" or "dog":
$ fgrep -e 'cat' -e 'dog' *
You can also list search patterns in a file and name the file on the command line with the -f option. The example below is a file named searchfor.txt that contains a list of search patterns for the singular or plural of various animals. The question mark at the end of each animal name applies to the preceding "s" and means zero or one occurrence of that letter.
dogs? cats? ducks? snakes?
To use this file to search another list of files, name it on the command line instead of a search pattern. The egrep utility will search for all the possible strings listed in searchfor.txt:
$ egrep -nf searchfor.txt *
Extending grep |
The egrep utility uses extended regular expressions, with a useful one being the plus (+) character, which works like the asterisk (*) but means "one or more" rather than "zero or more." Using egrep in the above example with a + instead of an * would cause the search to exclude "ct" because it doesn't contain one or more vowels.
$ egrep 'c[aeiou]+t' somewords.txt cat coat coot cot cout cut $
$ grep 'c[aeiou][aeiou]*t' somewords.txt cat coat coot cot cout cut $
* = zero or more occurrences + = one or more occurrences ? = zero or one occurrence
$ egrep 'c[aeiou]+t|p[aeiou]+l' somewords.txt cat coat coot cot cut cet cit pal paella paul paula peal peel pool $
In the following example, the first part of the command is entered on one line, and then Enter is pressed while the single quotes are still open. The shell prompts for additional input and continues to accept lines until the closing quote appears. Each individual line represents a separate search string to grep. This trick is useful with any version of grep.
$ grep 'c[aeiou][aeiou]*t > p[aeiou][aeiou]*l' somewords.txt cat coat coot cot cut cet cit pal paella paul paula peal peel pool $
$ egrep '([Ss]ome|[Aa]ny)one' somewords.txt someone Someone anyone Anyone $
egrep grep meaning [a-z]{2,4} [a-z]\{2,4\} Two through four characters [a-z]{4} [a-z]\{4\} Exactly four characters [a-z]{4,} [a-z]\{4,\} Four or more characters [a-z]{,4} [a-z]\{,4\} Zero through four characters
character matches . Any character \. A period $ End of line \$ A dollar sign * Zero or more occurrences of the preceding expression \* An asterisk \ Nothing -- is an escape character \\ A backslash | Create an "or" branch between two expressions \| A vertical bar
It can be hard to remember all of the grep and egrep characters that have a special meaning, and regular expressions are unfortunately far from regular. You have already seen that curly braces can be escaped in grep and, when escaped, acquire a special meaning. The same is true for parentheses and angle brackets. The following characters have special meanings in grep or egrep:
In egrep: | ^ $ . * + ? ( ) [ { } \ In grep: ^ $ . * \( \) [ \{ \} \
^ $ . * \( \) [ \ \< \>
The last collection of grep or egrep search pattern options is in fact a simple shorthand for describing a class of characters.
[:alpha:] Any alphabetic character [:lower:] Any lowercase character [:upper:] Any uppercase character [:digit:] Any digit [:alnum:] Any alphanumeric character (alphabetic or digit) [:space:] Any white space character (space, tab, vertical tab) [:graph:] Any printable character, except space [:print:] Any printable character, including the space [:punct:] Any punctuation (i.e., a printable character that ... [:cntrl:] Any nonprintable character
$ egrep '[[:digit:]]{10}' somenumbers.txt 1234554321 $
Pattern 2 searches for zip codes -- five digits followed by zero or one hyphen, followed by zero to four digits -- either with or without the following hyphen and four digit extension.
Pattern 3 searches for lines containing P.O. Box number addresses by using a case-independent search for "p," followed by zero or one period, then zero or more spaces, zero or one period and one or more spaces, and finally "box" or "drop." This should match most of the styles of data entry for a P.O. Box, including "PO Box," "PO BOX," "P.O. Box," "P O Box," "P. O. Drop," and so on.
Pattern 4 matches the word "cat" by searching for it where it's preceded by a beginning or line, or one or more spaces and followed by one or more spaces, or an end of line. This search will not match "concatenate."
1. egrep -n '\([0-9]{3}\)[0-9]{3}\-[0-9]{4}' somenumbers.txt 2. egrep -n '[0-9]{5}\-?[0-9]{0,4}' somenumbers.txt 3. egrep -in 'p\.? *o\. +(box|drop)' someaddresses.txt 4. egrep -n '(^| +)cat( +|$)' sometext.txt