Lecture 34: regular expressions

Regular expressions

Regular expressions are patterns that match certain strings. They give a way to define a language: the language of a regular expression is the set of all strings that match the pattern.

There are six ways to construct regular expressions. Formally, the set of regular expressions is formed by the following grammar:


rRE: :  = ∅  ∣  ε  ∣  a  ∣  r1r2  ∣  (r1r2)  ∣  r * 

matches no strings. L(∅) = ∅.

ε matches only the empty string. L(ε) = {ε}.

a matches the string "a". L(a) = {a}.

r1r2 (the concatenation of r1 and r2) matches any string that can be broken into two parts x and y, with x matching r1 and y matching r2. L(r1r2) = {xy  ∣  x ∈ L(r1), y ∈ L(r2)}.

(r1r2) (the alternation of r1 and r2, sometimes written r1 + r2 or r1 ∪ r2) matches any string that matches either r1 or r2. Formally, L(r1r2) = L(r1) ∪ L(r2).

r *  (the Kleene star or Kleene closure of r) matches the concatenation of any number (including 0) of strings, each of which match r. Formally, L(r) = {x1x2x3…  ∣  xi ∈ L(r)}.