分类: C/C++
2011-04-02 11:50:35
Contents | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Vim is an improved (in many ways) version of vi, a ubiquitous text editor found on any UNIX system. VIM was created by Bram Moolenaar with a help of other people. It's free but if you like it you can make a charitable contribution to orphans in Uganda. Vim has its own web site, and several , with a wealth of information on every aspect of VIM. Vim was successfully ported to nearly all existing OS. It is a default editor in many Linux distributions (e.g. RedHat). VIM has all features of a modern programmer's editor - macro language, syntax highlighting, customizable user interface, easy integration with various IDEs plus a set of features which makes VIM so attractive to its users: crash recovery, automatic commands, session management. VIM has a very broad and loyal user base. Over 10 million people have it installed (counting only Linux users). Estimation is that there are about half a million people using Vim as their main editor. And this number is growing. | ||||||||||||||||||||||||||||||||||||||||||||||||||
I started this tutorial for one simple reason - I like regular expressions. Nothing compares to the satisfaction from a well-crafted regexp which does exactly what you wanted it to do :-). I hope it's passable as a foreword. Speaking more seriously, regular expressions (or regexps for short) are tools used to manipulate text and data. They don't exist as a standalone product but usually are a part of some program/utility. The best known example is UNIX grep, a program to search files for lines that match certain pattern. The search pattern is described in terms of regular expressions. You can think of regexps as a specialized pattern language. Regexps are quite useful and can greatly reduce time it takes to do some tedious text editing. (Regexp terminology is largely borrowed from Jeffrey Friedl "Mastering Regular Expressions.") | ||||||||||||||||||||||||||||||||||||||||||||||||||
Many thanks (in no particular order): Benji Fisher, Zdenek Sekera, Preben "Peppe" Guldberg, Steve Kirkendall, Shaul Karl and all others who helped me with their comments. Feel free to send me (volontir at yahoo dot com) your comments. suggestions, examples... | ||||||||||||||||||||||||||||||||||||||||||||||||||
So, what can you do with regular expressions? The most common task is to make replacements in a text following some certain rules. For this tutorial you need to know VIM search and replace command (S&R)
Part of the command word enclosed in the "[" & "]" can be omitted. Before I begin with a pattern description let's talk about line addresses in Vim. Some Vim commands can accept a line range in front of them. By specifying the line range you restrict the command execution to this particular part of text only. Line range consists of one or more line specifiers, separated with a comma or semicolon. You can also mark your current position in the text typing
If no line range is specified the command will operate on the current line only. Here are a few examples:
- from 10 to 20 line. Each may be followed (several times) by "+" or "-" and an optional number. This number is added or subtracted from the preceding line number. If the number is omitted, 1 is used.
- all lines between Section 1 and Section 2, non-inclusively, i.e. the lines containing Section 1 and Section 2will not be affected. The
- first find Section 1, then the first line with Subsection, step one line down (beginning of the range) and find the next line with Subsection, step one line up (end of the range). The next example shows how you can reuse you search pattern:
- this will search for the Section line and yank (copy) one line after into the memory.
- and that will search for the next Section line and put (paste) the saved text on the next line. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Tip 1: frequently you need to do S&R in a text which contains UNIX file paths - text strings with slashes ("/") inside. Because S&R command uses slashes for pattern/replacement separation you have to escape every slash in your pattern, i.e. use "\/" for every "/" in your pattern:
To avoid this so-called "backslashitis" you can use different separators in S&R (I prefer ":")
Tip 2: You may find these mappings useful (put them in your .vimrc file)
These mappings save you some keystrokes and put you where you start typing your search pattern. After typing it you move to the replacement part , type it and hit return. The second version adds confirmation flag. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Suppose you want to replace all occurrences of vi with VIM. This can be easily done with
If you've tried this example then you, no doubt, noticed that VIM replaced all occurrences of vi even if it's a part of the word (e.g. navigator). If we want to be more specific and replace only whole words vi then we need to correct our pattern. We may rewrite it by putting spaces around vi:
But it will still miss vi followed by the punctuation or at the end of the line/file. The right way is to put special word boundary symbols "
The beginning and the end of the line have their own special anchors - "
To match the lines where vi is the only word:
Now suppose you want to replace not only all vi but also Vi and VI. There are several ways to do this:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
So far our pattern strings were constructed from normal or literal text characters. The power of regexps is in the use of metacharacters. These are types of characters which have special meaning inside the search pattern. With a few exceptions these metacharacters are distinguished by a "magic" backslash in front of them. The table below lists some common VIM metacharacters.
So, to match a date like 09/01/2000 you can use (assuming you don't use "/" as a separator in the S&R)
To match 6 letter word starting with a capital letter
Obviously, it is not very convenient to write | ||||||||||||||||||||||||||||||||||||||||||||||||||
Using quantifiers you can set how many times certain part of you pattern should repeat by putting the following after your pattern:
Now it's much easier to define a pattern that matches a word of any length These quantifiers are greedy - that is your pattern will try to match as much text as possible. Sometimes it presents a problem. Let's consider a typical example - define a pattern to match delimited text, i.e. text enclosed in quotes, brackets, etc. Since we don't know what kind of text is inside the quotes we'll use
But this pattern will match everything between the first " and the last " in the following line:
This problem can be resolved by using non-greedy quantifiers:
Let's use
Before:
After:
"As few as possible" applied here means zero character replacements. However match does occur between characters! To explain this behavior I quote Bram himself: Matching zero characters is still a match. Thus it will replace zero characters with a "_". And then go on to the next position, where it will match again. It's true that using "\{-}" is mostly useless. It works this way to be consistent with "*", which also matches zero characters. There are more useless ones: "x\{-1,}" always matches one x. You could just use "x". More useful is something like "x\{70}". The others are just consistent behavior: ..., "x\{-3,}", "x\{-2,}", "x\{-1,}. - Bram But what if we want to match only the second occurrence of quoted text? Or we want to replace only a part of the quoted text keeping the rest untouched? We will need grouping and backreferences. But before let's talk more about character ranges. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Typical character ranges:
Note that the range represents just one character in the search pattern, that is
to the following text: Before:
After:
and now:
Before:
After:
Sometimes it's easier to define the characters you don't want to match. This is done by putting a negation sign / - will match any character except capital letters. We can now rewrite our pattern for quoted text using
Note: inside the [ ] all metacharacters behave like ordinary characters. If you want to include "-" (dash) in your range put it first
- will match all digits and -. "^" will lose its special meaning if it's not the first character in the range. Now, let's have some real life example. Suppose you want to run a grammar check on your file and find all places where new sentence does not start with a capital letter. The pattern that will catch this:
- a period followed by one or more blanks and a lowercase word. We know how to find an error, now let's see how we can correct it. To do this we need some ways to remember our matched pattern and recall it later. That is exactly what backreferences are for. | ||||||||||||||||||||||||||||||||||||||||||||||||||
You can group parts of the pattern expression enclosing them with " s:\(\w\+\)\(\s\+\)\(\w\+\):\3\2\1: where Replacement Part of :substituteReplacement part of the S&R has its own special characters which we are going to use to fix grammar:
Now the full S&R to correct non-capital words at the beginning of the sentences looks like
We have corrected our grammar and as an extra job we replaced variable number of spaces between punctuation and the first letter of the next sentence with exactly two spaces. | ||||||||||||||||||||||||||||||||||||||||||||||||||
Using "
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Tip 3: Quick mapping to put \(\) in your pattern string
| ||||||||||||||||||||||||||||||||||||||||||||||||||
As in arithmetic expressions, regular expressions are executed in a certain order of precedence. Here the table of precedence, from highest to lowest:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
5.1 Global search and executionI want to introduce another quite useful and powerful Vim command which we're going to use later
The global commands work by first scanning through the [range] of of the lines and marking each line where a match occurs. In a second scan the [cmd] is executed for each marked line with its line number prepended. If a line is changed or deleted its mark disappears. The default for the [range] is the whole file. Note: Ex commands are all commands you are entering on the Vim command line like
mechanism. Some examples of
- delete all empty lines in a file
- reduce multiple blank lines to a single blank
- reverse the order of the lines starting from the line 10 up to the line 20. Here is a modified example from :
- in the text block marked by You can give multiple commands after
- will copy all Error line to the end of the file and then make a substitution in the copied line. Without giving the line address
- here the order is reversed: first modify the string then copy to the end. | ||||||||||||||||||||||||||||||||||||||||||||||||||
A collection of some useful S&R tips: (1) sent by Antonio Colombo: "a simple regexp I use quite often to clean up a text: it drops the blanks at the end of the line:"
or (to avoid acting on all lines):
| ||||||||||||||||||||||||||||||||||||||||||||||||||
For this example you need to know a bit of HTML. We want to make a table of contents out of (1) First let's make named anchors in all headings, i.e. put
Explanation: the first pair of (2) Now let's copy all headings to one place:
This command searches our file for the lines starting with
First, we want to convert all
Second, we want our
Now our entries look like: We no longer need
and replace closing tags with breaklines
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Quite often you have to work with a text organized in tables/columns. Consider, for example, the following text
Suppose we want to change all "Europe" cells in the third column to "Asia":
To swap the first and the last columns:
To be continued... | ||||||||||||||||||||||||||||||||||||||||||||||||||
Here I would like to compare Vim's regexp implementation with others, in particular, Perl's. You can't talk about regular expressions without mentioning Perl. (with a help from ) The main differences between Perl and Vim are:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Read VIM documentation about pattern and searching. To get this type ":help pattern" in VIM normal mode. There are currently two books on the market that deal with VIM regular expressions:
Definitive reference on regular expressions is Jeffrey Friedl's published by O'Reilly & Associates, but it mostly deals with Perl regular expressions. O'Reilly has one of the book chapters available online. |