# Lecture 24: Pumping lemma

## A non-recognizable set

Let $$L = \{0^k1^k \mid k \in \mathbb{N}\} = \{ε, 01, 0011, 000111, \dots\}$$.

Claim: $$L$$ is not recognizable.

Proof: by contradiction. Suppose $$L$$ were recognizable. Then there is some $$M$$ with $$L = L(M)$$. Let $$n$$ be the number of states of $$M$$, and let $$x = 0^n1^n$$. Clearly $$x \in L$$, so $$M$$ must accept $$x$$.

Let's consider what happens while $$M$$ is processing $$x$$. While processing the first $$n$$ characters, $$M$$ must pass through $$n+1$$ states $$q_0$$, $$q_1$$, , $$q_n$$. Since there are only $$n$$ states to choose from, two of these states must be the same: there is a loop; $$q_i = q_j$$ for some $$i \lt j \leq n$$.

Let $$u$$ be the part of $$x$$ that transitions from $$q_0$$ to $$q_i$$; $$v$$ be the part that transitions from $$q_i$$ to $$q_j$$, and let $$w$$ be the part that transititons from $$q_j$$ to $$q_n$$ (which remember, is a final state). Note that since the loop happens within the first $$n$$ characters, $$u$$ and $$v$$ can consist only of 0's.

Now consider what happens if we plug the string $$uvvw$$ into $$M$$. $$M$$ will transition to $$q_i$$, and then go around the loop twice, ending up back at $$q_j$$. It will then process $$w$$, taking it from $$q_j$$ to $$q_n$$, where it will be accepted. Therefore $$uvvw \in L(M)$$.

However, since $$v$$ consisted of one or more 0s, $$uvvw$$ has more 0's than 1's, so $$uvvw \notin L$$. This contradicts the assumption that $$L(M) = L$$, completing the proof.

## The pumping lemma

This same argument can be applied to many languages, and can be generalized into the so-called "pumping lemma":

Claim (pumping lemma): If $$L$$ is a DFA-recognizable language, then there exists some $$n$$ (often called the pumping length), such that for all $$x \in L$$ with $$len(x) \geq n$$, there exists strings $$u$$, $$v$$, and $$w$$ such that

1. $$x = uvw$$,
2. $$len(uv) \leq n$$,
3. $$len(v) > 0$$, and
4. for all $$k \geq 0$$, $$uv^kw \in L$$.

The proof is just like the proof above; we give it below.

This lemma is used to prove that languages are not DFA-recognizable. For example, we can use it to rewrite the proof above:

Claim: $$L = \{0^n1^n \mid n \in \mathbb{N}\}$$ is not DFA-recognizable.

Proof: by contradiction, assume that $$L$$ is DFA-recognizable. Then there exists some $$n$$ as in the pumping lemma. Let $$x = 0^n1^n$$. Clearly $$x \in L$$ and $$len(x) \geq n$$, so we can write $$x$$ as $$uvw$$ as in the pumping lemma. Since $$len(uv) \leq n$$, $$v$$ can only consist of 0's (the first $$n$$ characters of $$x$$ are 0's). It must have at least one 0, since $$len(v) > 0$$. The pumping lemma tells us that $$uv^2w \in L$$, but this is a contradiction, because $$uv^2w$$ has more 0's than 1's. Therefore $$L$$ is not regular.

Here is another example:

Claim: Let $$L$$ be the set of strings of digits and the symbols $$+$$ and $$=$$ that represent equations that are true. For example, "$$1+1=2$$" is in $$L$$, while "$$3+5=9$$" is not. $$L$$ is not recognizable.

Proof: by contradiction, assume that $$L$$ is DFA-recognizable. Then there exists some $$n$$ as in the pumping lemma. Let $$x = "1^n+0=1^n"$$. Clearly $$x \in L$$ and $$len(x) \geq n$$, so we can write $$x$$ as $$uvw$$ as in the pumping lemma. Since $$len(uv) \leq n$$, $$v$$ can only consist of 1's (the first $$n$$ characters of $$x$$ are 1's). It must have at least one 1, since $$len(v) > 0$$. The pumping lemma tells us that $$uv^0w = uw \in L$$, but this is a contradiction, because $$uw$$ has a smaller number on the left hand side of the equation than on the right side, and therefore is not in $$L$$. Thus, $$L$$ is not DFA-recognizable.

Proof of the pumping lemma: This proof is almost the same as the special case given above. Assume $$L$$ is DFA-recognizable. Then there is some machine $$M$$ that recognizes $$L$$. Let $$n$$ be the number of states of $$M$$. Now, if $$x$$ is an arbitrary string in $$L$$ with length greater than or equal to $$M$$, then while processing the first $$n$$ characters, $$M$$ must traverse the some state $$q$$ at least twice.

Let $$u$$ be the portion of $$x$$ that transitions $$M$$ from the start state to $$q$$. Let $$v$$ be the portion of $$x$$ that transitions from $$q$$ back to $$q$$, and let $$w$$ be the remainder of $$x$$; $$w$$ transitions $$M$$ from $$q$$ to some final state (since $$x \in L$$, $$\hat{δ}(q_0,uvw)$$ must be a final state).

Clearly $$x = uvw$$. $$len(uv) \leq n$$ since the loop must occur within the first $$n$$ characters of $$x$$. $$len(v) \gt 0$$ because otherwise the loop is not a loop. Finally, while processing $$uv^kw$$, $$M$$ transitions to $$q$$ on $$u$$, then back to $$q$$ on each iteration of $$v$$, and finally from $$q$$ to an accepting state on $$w$$, and thus $$M$$ accepts $$uv^kw$$. Therefore $$uv^kw \in L(M) = L$$, completing the proof.