Lecture 27: Kleene's theorem

Reading: Pass and Tseng, Section 8.3, optional Kleene's theorem section
Kleene's theorem: The set of regular languages, the set of NFA-recognizable languages, and the set of DFA-recognizable languages are all the same
translating RE to NFA
translating NFA to RE
Review exercises:
- Use Kleene's theorem to prove that the intersection, union, and complement of regular languages is regular
- Use Kleene's theorem to show that there is no regular expression that matches strings of balanced parentheses.
- Draw a variety of NFA, DFA, and RE and use the constructions here and in previous lectures to convert them to NFA, DFA, and REs.

Overiew of Kleene's theorem

Kleene's theorem: The set of regular languages, the set of NFA-recognizable languages, and the set of DFA-recognizable languages are all the same.

Proof: We must be able to translate between NFAs, DFAs, and regular expressions. We have covered the following algorithms to do these translations:

tranlations overview; DFA to NFA, NFA to regular expression (via generalized NFA), regular expression to DFA (via ε-NFA) (TeX source)

Conversions between DFA and NFA were covered in a previous lecture.

Converting regular expressions to DFAs

To convert a regular expression to an NFA, we first convert it to an \(ε\)-NFA, then convert that to a DFA.

An ε-NFA is like an NFA, except that we are allowed to include "epsilon transitions". In a normal NFA or DFA, every character in the string causes a single transition in the machine, and each transition in the machine "consumes" one character. Epsilon transitions allow the machine to transition without consuming a character. They make it more convenient to build machines.

We can convert an epsilon-NFA to a DFA in exactly the same way as we converted an NFA to a DFA: the states of the DFA represent sets of states of the ε-NFA; we interpret the state \(\{q_1,q_2,\dots,q_k\}\) of the DFA as meaning that the \(ε\)-NFA could be in any of the states \(q_1, \dots q_k\). When computing the transition function, we just need to take the "epsilon closure" of the state we transition to; we need to add in all of the states that we could get to by following epsilon transitions.

To convert a regular expression \(r\) to an NFA, we induct on the structure of \(r\). For each kind of regular expressions, we build a machine that recognizes the same language as the expression. To make the construction easier, we produce machines that have only a single accept state.

Care must be taken while combining machines to account for the fact that you can have transitions out of the final state. Here are the constructions for the various cases:

RE to NFA table (TeX source)

Converting NFA to RE

To convert an NFA to a regular expression, we introduced the concept of a "generalized NFA". A generalized NFA is allowed to have transitions that are labelled by a regular expression (instead of just a single character). A string \(x\) is accepted by a generalized NFA if there is a path from the start state to a final state labelled by regular expressions \(r_0, r_1, \dots, r_n\) such that the regular expression \(r_0r_1\dots{}r_n\) matches \(x\). Thought of another way, while processing a string, you can follow a transition from \(q\) to \(q'\) labeled \(r\) by consuming characters of \(x\) that match \(r\).

To convert an NFA to a regular expression, we first think of the NFA as a generalized NFA. We then transform it so that it has a single final state by adding epsilon transitions (we can do this, because \(ε\) is a regular expression).

We then repeatedly remove non-final non-start states and replace them with regular expression transitions that capture paths through the removed node. For example, we might remove state B from the automaton on the left by producing the transitions on the right:

removing a state from an NFA (TeX source)

After removing all of the states, we end up with a generalized NFA with just a start state and a final state. We can then form a regular expression that captures the transition from the start state back to itself any number of times, then a transition to the final state, and then a loop in the final state (without going back to the start state) any number of times. If the reduced generalized NFA is this:

reduced NFA (TeX source)

then the equivalent regular expression is \((r_1+r_2r_4^*r_3)^*r_2r_4^*\).