# Lecture 24: Pumping lemma

• Reading: Pass and Tseng, Limits of Automata, MCS 15.8 The pigeonhole principle

• Previous semester's notes

• Finish closure under union

• Pumping lemma

• Review exercises:
• prove that the intersection of DFA-recognizable sets is recognizable. It's a good exercise to do it using only the hints in the "looking back on the proof" section below.
• use the pumping lemma to prove that the set of strings of balanced parentheses is not recognizable
• prove the pumping lemma

## Closure under union

Claim: If $$L_1$$ and $$L_2$$ are DFA-recognizable, then so is $$L_1 \cup L_2$$.

Proof: Since $$L_1$$ and $$L_2$$ are recognizable, there are machines $$M_1 = (Q_1, Σ, δ_1, q_{01}, F_1)$$ and $$M_2 = (Q_2, Σ, δ_2, q_{02}, F_2)$$ that recognize them. We want to construct a machine $$M$$ that recognizes $$L_1 \cup L_2$$.

What would such a machine need to know while processing $$x$$? If it knew what states $$M_1$$ and $$M_2$$ were in, it would know whether to accept or not. So this suggests that a state of $$M$$ should correspond to a pair of states, one from $$M_1$$ and one from $$M_2$$. This is the construction we use.

Let $$M = (Q_1 \times Q_2, Σ, δ, q_0, F)$$, where $$δ$$, $$q_0$$ and $$F$$ are defined as follows.

To define $$δ$$, we first note the domain and codomain: $$δ : (Q_1 \times Q_2) \times Σ → (Q_1 \times Q_2)$$. We want $$M$$ to be in state $$(q_1, q_2)$$ if $$M_1$$ is in state $$q_1$$ and $$M_2$$ is in state $$q_2$$. If we then see another character $$a$$, we would want to step $$M_1$$ to $$δ_1(q_1,a)$$ and $$M_2$$ to $$δ_2(q_2,a)$$. This suggests the following definition: $δ((q_1,q_2), a) ::= (δ_1(q_1,a), δ_2(q_2, a))$

Where should $$M$$ start? If we process the empty string, $$M_1$$ would be in state $$q_{01}$$, and $$M_2$$ would be in state $$q_{02}$$, so let's choose $$q_0 = (q_{01},q_{02})$$.

What about the final states? We want $$M$$ to accept if either $$M_1$$ would or $$M_2$$ would. That suggests $F = \{(q_1,q_2) \mid q_1 \in F_1\text{ or } q_2 \in F_2\}$

We now want to show that $$L(M) = L(M_1) \cup L(M_2)$$. We start by showing that $$M$$ works properly. Let's write down a specification and prove it.

Let $$P(x)$$ be the statement $$\hat{δ}(q_0,x) = (\hat{δ}_1(q_{01},x), \hat{δ}_2(q_{02}, x))$$ (informally, $$M$$ correctly simulates $$M_1$$ and $$M_2$$). We will prove $$∀x, P(x)$$ by induction on $$x$$.

To see $$P(ε)$$, note that $$\hat{δ}(q_0, ε) = q_0 = (q_{01},q_{02})$$ by definition of $$\hat{δ}$$ and $$q_0$$. On the other side, we have $$(\hat{δ}_1(q_{01},ε), \hat{δ}_2(q_{02}, ε)) = (q_{01}, q_{02})$$ by definition of $$\hat{δ}_1$$ and $$\hat{δ}_2$$. Since these are the same, we are done.

To see $$P(xa)$$, first inductively assume $$P(x)$$. We compute \begin{aligned} \hat{δ}(q_0, xa) &= δ(\hat{δ}(q_0, x), a) && \text{by definition of \hat{δ}} \\ &= δ(\hat{δ}((q_{01}, q_{02}), x), a) && \text{by definition of q_0} \\ &= δ\left(\left(\hat{δ}_1(q_{01}, x), \hat{δ}_2(q_{02}, x)\right), a\right) && \text{by P(x)} \\ &= (δ_1(\hat{δ}_1(q_{01}, x), a), δ_2(\hat{δ}_2(q_{02}, x), a)) && \text{by definition of δ} \\ &= (\hat{δ}_1(q_{01},xa), \hat{δ}_2(q_{02},xa)) && \text{by definition of \hat{δ}_1 and \hat{δ}_2} \end{aligned}

Now that we know that $$M$$ simulates $$M_1$$ and $$M_2$$, it is easy to prove that $$L(M) = L(M_1) \cup L(M_2)$$. As with the rest of the proof, we just keep plugging in definitions:

\begin{aligned} L(M) &= \{x \mid \hat{δ}(q_0, x) \in F\} && \text{by definition of L} \\ &= \{x \mid (\hat{δ}_1(q_{01},x), \hat{δ}_2(q_{02}, x)) \in F\} && \text{this is P(x), which we just proved} \\ &= \{x \mid \hat{δ}_1(q_{01}, x) \in F_1 \text{ or } \hat{δ}_2(q_{02}, x) \in F_2\} && \text{by definition of F} \\ &= \{x \mid \hat{δ}_1(q_{01}, x) \in F_1\} \cup \{x \mid \hat{δ}_2(q_{02}, x) \in F_2\} && \text{by definition of \cup} \\ &= L(M_1) \cup L(M_2) && \text{by definition of L} \\ \end{aligned}

### Looking back at the proof

This proof looks intimidating. It isn't. The summary is: build a machine that simulates $$M_1$$ and $$M_2$$, and use induction. Everything else is just plugging in definitions or inductive hypotheses.

## A non-recognizable set

Let $$L = \{0^k1^k \mid k \in \mathbb{N}\} = \{ε, 01, 0011, 000111, \dots\}$$.

Claim: $$L$$ is not recognizable.

Proof: by contradiction. Suppose $$L$$ were recognizable. Then there is some $$M$$ with $$L = L(M)$$. Let $$n$$ be the number of states of $$M$$, and let $$x = 0^n1^n$$. Clearly $$x \in L$$, so $$M$$ must accept $$x$$.

Let's consider what happens while $$M$$ is processing $$x$$. While processing the first $$n$$ characters, $$M$$ must pass through $$n+1$$ states $$q_0$$, $$q_1$$, , $$q_n$$. Since there are only $$n$$ states to choose from, two of these states must be the same: there is a loop; $$q_i = q_j$$ for some $$i \lt j \leq n$$.

Let $$u$$ be the part of $$x$$ that transitions from $$q_0$$ to $$q_i$$; $$v$$ be the part that transitions from $$q_i$$ to $$q_j$$, and let $$w$$ be the part that transititons from $$q_j$$ to $$q_n$$ (which remember, is a final state). Note that since the loop happens within the first $$n$$ characters, $$u$$ and $$v$$ can consist only of 0's.

Now consider what happens if we plug the string $$uvvw$$ into $$M$$. $$M$$ will transition to $$q_i$$, and then go around the loop twice, ending up back at $$q_j$$. It will then process $$w$$, taking it from $$q_j$$ to $$q_n$$, where it will be accepted. Therefore $$uvvw \in L(M)$$.

However, since $$v$$ consisted of one or more 0s, $$uvvw$$ has more 0's than 1's, so $$uvvw \notin L$$. This contradicts the assumption that $$L(M) = L$$, completing the proof.

## The pumping lemma

This same argument can be applied to many languages, and can be generalized into the so-called "pumping lemma":

Claim (pumping lemma): If $$L$$ is a DFA-recognizable language, then there exists some $$n$$ (often called the pumping length), such that for all $$x \in L$$ with $$len(x) \geq n$$, there exists strings $$u$$, $$v$$, and $$w$$ such that

1. $$x = uvw$$,
2. $$len(uv) \leq n$$,
3. $$len(v) > 0$$, and
4. for all $$k \geq 0$$, $$uv^kw \in L$$.

The proof is just like the proof above; we give it below.

This lemma is used to prove that languages are not DFA-recognizable. For example, we can use it to rewrite the proof above:

Claim: $$L = \{0^n1^n \mid n \in \mathbb{N}\}$$ is not DFA-recognizable.

Proof: by contradiction, assume that $$L$$ is DFA-recognizable. Then there exists some $$n$$ as in the pumping lemma. Let $$x = 0^n1^n$$. Clearly $$x \in L$$ and $$len(x) \geq n$$, so we can write $$x$$ as $$uvw$$ as in the pumping lemma. Since $$len(uv) \leq n$$, $$v$$ can only consist of 0's (the first $$n$$ characters of $$x$$ are 0's). It must have at least one 0, since $$len(v) > 0$$. The pumping lemma tells us that $$uv^2w \in L$$, but this is a contradiction, because $$uv^2w$$ has more 0's than 1's. Therefore $$L$$ is not regular.

Here is another example:

Claim: Let $$L$$ be the set of strings of digits and the symbols $$+$$ and $$=$$ that represent equations that are true. For example, "$$1+1=2$$" is in $$L$$, while "$$3+5=9$$" is not. $$L$$ is not recognizable.

Proof: by contradiction, assume that $$L$$ is DFA-recognizable. Then there exists some $$n$$ as in the pumping lemma. Let $$x = "1^n+0=1^n"$$. Clearly $$x \in L$$ and $$len(x) \geq n$$, so we can write $$x$$ as $$uvw$$ as in the pumping lemma. Since $$len(uv) \leq n$$, $$v$$ can only consist of 1's (the first $$n$$ characters of $$x$$ are 1's). It must have at least one 1, since $$len(v) > 0$$. The pumping lemma tells us that $$uv^0w = uw \in L$$, but this is a contradiction, because $$uw$$ has a smaller number on the left hand side of the equation than on the right side, and therefore is not in $$L$$. Thus, $$L$$ is not DFA-recognizable.

Proof of the pumping lemma: This proof is almost the same as the special case given above. Assume $$L$$ is DFA-recognizable. Then there is some machine $$M$$ that recognizes $$L$$. Let $$n$$ be the number of states of $$M$$. Now, if $$x$$ is an arbitrary string in $$L$$ with length greater than or equal to $$M$$, then while processing the first $$n$$ characters, $$M$$ must traverse the some state $$q$$ at least twice.

Let $$u$$ be the portion of $$x$$ that transitions $$M$$ from the start state to $$q$$. Let $$v$$ be the portion of $$x$$ that transitions from $$q$$ back to $$q$$, and let $$w$$ be the remainder of $$x$$; $$w$$ transitions $$M$$ from $$q$$ to some final state (since $$x \in L$$, $$\hat{δ}(q_0,uvw)$$ must be a final state).

Clearly $$x = uvw$$. $$len(uv) \leq n$$ since the loop must occur within the first $$n$$ characters of $$x$$. $$len(v) \gt 0$$ because otherwise the loop is not a loop. Finally, while processing $$uv^kw$$, $$M$$ transitions to $$q$$ on $$u$$, then back to $$q$$ on each iteration of $$v$$, and finally from $$q$$ to an accepting state on $$w$$, and thus $$M$$ accepts $$uv^kw$$. Therefore $$uv^kw \in L(M) = L$$, completing the proof.