Lecture 21: Structural induction

Idea behind structural induction

Consider the definition \(x \in Σ^* ::= ε \mid xa\). I will refer to \(x ::= ε\) as "rule 1" and \(x ::= xa\) as "rule 2". This definition says that there are two kinds of strings: empty strings (formed using rule 1), and strings of the form \(xa\), where \(x\) is a smaller string (formed using rule 2); these are the only kinds of strings.

If we want to prove that property \(P\) holds on all strings (i.e. \(∀x \in Σ^*, P(x)\)), we can do it by giving a proof for strings formed using rule 1 (let's call it proof 1), and another proof for strings formed using rule 2 (let's call it proof 2). In the second proof, we may assume that \(P(y)\) holds.

Why can we make this assumption? Suppose we have some complicated string, like \(εabc\), and we want to conclude \(P(εabc)\). We build the string \(εabc\) by snapping together smaller strings using rules 1 and 2; we can imagine building a proof of \(P(εabc)\) by snapping together smaller proofs using proofs 1 and 2.

To show that \(εabc\) is a string, we first use rule 1 to show that \(ε\) is a string, then rule 2 to show that \(εa\) is a string (this assumes that \(ε\) is a string, but we just argued it was), and then rule 2 again to show that \(εab\) is a string (using the fact that \(εa\) is a string), and finally use rule 2 a third time to show that \(εabc\) is a string.

Similarly, we can use proof 1 to show that \(P(ε)\) holds, then use proof 2 to show that \(P(εa)\) holds (this assumes that \(P(ε)\) holds, but we just argued it does), and then use proof 2 again to show that \(P(εab)\) holds (using the fact that \(P(εa)\) holds), and finally use proof 2 a third time to show that \(P(εabc)\) holds.

In general, any element of an inductively defined set is built up by applying the rules defining the set, so if you provide a proof for each rule, you have given a proof for every element. Before you can build a complex structure, you have to build the parts, so while building the proof that some property holds on a complex structure, you can assume that you have already proved it for the subparts.

Structural induction step by step

In general, if an inductive set \(X\) is defined by a set of rules (rule 1, rule 2, etc.), then we can prove \(∀x \in X, P(X)\) by giving a separate proof of \(P(x)\) for \(x\) formed by each of the rules. In the cases where the rule recursively uses elements \(y_1, y_2, \dots\) of the set being defined, we can assume \(P(y_1), P(y_2), \dots\).

Example structures:

Example proof

Recall \(Σ^*\) is defined by \(x \in Σ^* ::= ε \mid xa\) and \(len : Σ^* → \N\) is given by \(len(ε) ::= 0\) and \(len(xa) ::= 1 + len(x)\).

Claim: For all \(x \in Σ^*\), \(len(x) \geq 0\) Proof: By induction on the structure of \(x\). Let \(P(x)\) be the statement "\(len(x) \geq 0\)". We must prove \(P(ε)\), and \(P(xa)\) assuming \(P(x)\).

\(P(ε)\) case: we want to show \(len(ε) \geq 0\). Well, by definition, \(P(ε) = 0 \geq 0\).

\(P(xa)\) case: assume \(P(x)\). That is, \(len(x) \geq 0\). We wish to show \(P(xa)\), i.e. that \(len(xa) \geq 0\). Well, \(len(xa) = 1 + len(x) \geq 1 + 0 = 1\).

Proofs on pairs

Often, we want to prove something about all pairs \(x\) and \(y\), where \(x\) and \(y\) are both in an inductively defined set \(X\). Pairs of elements of \(X\) are formed by pairs of rules of \(X\), so one can give a proof for each pair of rules. For example, to prove \(∀x,y \in Σ^*, len(cat(x,y)) = len(x) + len(y)\), you can give a proof for the case where \(x\) and \(y\) are both \(ε\), a proof for the case when \(x = ε\) and \(y\) is of the form \(zc\), a proof for the case when \(x = zc\) and \(y = ε\), and a proof for the case where \(x = zc\) and \(y = wd\).

What inductive assumptions can be made in these cases? You can inductively assume that \(P\) holds on any pair that is formed from a subpiece of \(x\) and a subpiece of \(y\), and at least one of those subpieces needs to be smaller. For example, while proving \(P(zc,wd)\), you can assume \(P(z,wd)\), you can assume \(P(zc,w)\), and you can assume \(P(z,w)\). You can't assume \(P(zc,wd)\) (since that's what you're trying to prove). You can't assume \(P(c,d)\), because that doesn't even make sense: \(c\) and \(d\) are elements of \(Σ\) not \(Σ^*\), and \(P\) is a property of pairs of strings, not pairs of characters. You can't assume \(P(εc, wd)\) because \(εc\) is not a subpiece of \(zc\). You can't assume \(P(cat(z,w),w)\) because \(cat(z,w)\) is not a substructure of \(zc\). You shouldn't assume \(P(w,z)\), although this can be justified using more advanced techniques.

Here is an example:

Claim: for all \(x\) and \(y\) in \(Σ^*\), \(len(cat(x,y)) = len(x) + len(y)\).

Proof: Recall \(len(ε) ::= 0\) and \(len(xa) ::= 1 + len(x)\). Recall also that \(cat(ε,ε) ::= ε\), \(cat(ε,xa) ::= xa\), \(cat(xa, ε) ::= xa\) and \(cat(xa, yb) ::= cat(xa,t)b\).

We proceed by induction on the structure of \(x\) and \(y\). Let \(P(x,y)\) be the statement \(len(cat(x,y)) = len(x) + len(y)\).

\(P(ε,ε)\) case: we want to show \(len(cat(ε,ε)) = len(ε) + len(ε)\). By definition, the left hand side is \(len(ε) = 0\), and the right hand side is \(0 + 0 = 0\).

\(P(ε,xa)\) case: we want to show \(len(cat(ε,xa)) = len(ε) + len(xa)\). By definition, \(cat(ε,xa) = xa\), so $len(cat(ε,xa)) = len(xa). We also know \(len(ε) = 0\), so the right hand side also simplifies to \(len(xa)\).

The \(P(xa,ε)\) case is symmetric to the \(P(ε,xa)\) case.

In the \(P(xa,yb)\) case, we want to show that \(len(cat(xa,yb)) = len(xa) + len(yb)\). We may assume \(P(xa,y)\), i.e. that \(len(cat(xa,y)) = len(xa) + len(y)\). Using this, we have \[ \begin{aligned} len(cat(xa,yb)) &= len(cat(xa,y)b) && \text{by definition of $cat$} \\ &= 1 + len(cat(xa,y)) && \text{by definition of $len$} \\ &= 1 + len(xa) + len(y) = len(xa) + (len(y) + 1) && \text{by inductive assumption} \\ &= len(xa) + len(yb) && \text{by definition of $len$} \end{aligned} \]

This concludes the proof.

Note that the structure of this proof very closely follows the structure of the function we were proving something about. In this case, we were proving a property of the \(cat\) function; \(cat(xa,yb)\) was defined in terms of \(cat(xa,y)\), and in the proof of \(P(xa,yb)\), we had to use the assumption \(P(xa,y)\). This is a common occurrence in proofs by structural induction.