A collection of symbols is known as an alphabet. We use an upper case letter such as X and Y to denote alphabets.
Though alphabets can be infinite, we are concerned only with finite alphabets. For example, X={0, 1} is an alphabet consisting of two symbols 0 and 1. Another alphabet is Y={dog, cat, horse, lion}that consists of four symbols ``dog", ``cat", ``horse", and ``lion".
A string over an alphabet X is any sequence of zero or more symbols that belong to X. For example, 0110 is a string over the alphabet {0, 1}. Also, dog cat dog dog lion is a string over the alphabet {dog, cat, horse, lion}.
We will use lower case letters such as p, q, r to denote strings. The length of a string is the number of symbols in that string. Given a string s, we denote its length by |s|. Thus |1011|=4 and |dog cat dog|=3. A string of length 0, also known as an empty string, is denoted by e.
Let s1 and s2 be two strings over alphabet X. We write s1.s2 to denote the concatenation of strings s1 and s2.
A set L of strings over an alphabet X is known as a language. A language can be finite or infinite.
{00, 11, 0101}: A language containing three strings
Given a finite alphabet X, the following are regular expressions over X:
If a belongs to X, then a is a regular expression that denotes the set {a}.
Let r1 and r2 be two regular expressions over the alphabet X that denote, respectively, sets L1 and L2. Then r1.r2 is a regular expression that denotes the set L1.L2.
If r is a regular expression that denotes the set L then r+ is a regular expression that denotes the set obtained by concatenating L with itself one or more times also written as L+ Also, r* known as the Kleene closure of r, is a regular expression. If r denotes the set L then r* denotes the set {} L+.
If r1 and r2 are regular expressions that denote, respectively, sets L1 and L2, then r1r2 is also a regular expression that denotes the set L1 L2.
阅读(1116) | 评论(0) | 转发(0) |