Lecture 34: regular expressions

Regular expressions
- L(r)
ε-NFA (discussed in notes for next lecture)

Regular expressions

Regular expressions are patterns that match certain strings. They give a way to define a language: the language of a regular expression is the set of all strings that match the pattern.

There are six ways to construct regular expressions. Formally, the set of regular expressions is formed by the following grammar:

r∈RE: : = ∅ ∣ ε ∣ a ∣ r₁r₂ ∣ (r₁∣r₂) ∣ r^*

∅ matches no strings. L(∅) = ∅.

ε matches only the empty string. L(ε) = {ε}.

a matches the string "a". L(a) = {a}.

r₁r₂ (the concatenation of r₁ and r₂) matches any string that can be broken into two parts x and y, with x matching r₁ and y matching r₂. L(r₁r₂) = {xy ∣ x ∈ L(r₁), y ∈ L(r₂)}.

(r₁∣r₂) (the alternation of r₁ and r₂, sometimes written r₁ + r₂ or r₁ ∪ r₂) matches any string that matches either r₁ or r₂. Formally, L(r₁∣r₂) = L(r₁) ∪ L(r₂).

r^* (the Kleene star or Kleene closure of r) matches the concatenation of any number (including 0) of strings, each of which match r. Formally, L(r) = {x₁x₂x₃… ∣ x_i ∈ L(r)}.