Lecture 20: Inductive definitions

Inductively defined sets

An inductively defined set is a set where the elements are constructed by a finite number of applications of a given set of rules.



Compact way of writing down inductively defined sets: BNF (Backus Naur Form)

Only the name of the set and the rules are written down; they are separated by a "::=", and the rules are separated by vertical bar (\(|\)).

Examples (from above):

Here, the variables to the left of the \(\in\) indicate metavariables. When the same characters appear in the rules on the right-hand side of the \(::=\), they indicate an arbitrary element of the set being defined. For example, the \(e_1\) and \(e_2\) in the \(e_1 + e_2\) rule could be arbitrary elements of the set \(E\), but \(+\) is just the symbol \(+\).

Inductively defined functions

If \(X\) is an inductively defined set, you can define a function from \(X\) to \(Y\) by defining the function on each of the types of elements of \(X\); i.e. for each of the rules. In the inductive rules (i.e. the ones containing the metavariable being defined), you can assume the function is already defined on the subterms.


Idea behind structural induction

Consider the definition \(x \in Σ^* ::= ε \mid xa\). I will refer to \(x ::= ε\) as "rule 1" and \(x ::= xa\) as "rule 2". This definition says that there are two kinds of strings: empty strings (formed using rule 1), and strings of the form \(xa\), where \(x\) is a smaller string (formed using rule 2); these are the only kinds of strings.

If we want to prove that property \(P\) holds on all strings (i.e. \(∀x \in Σ^*, P(x)\)), we can do it by giving a proof for strings formed using rule 1 (let's call it proof 1), and another proof for strings formed using rule 2 (let's call it proof 2). In the second proof, we may assume that \(P(y)\) holds.

Why can we make this assumption? Suppose we have some complicated string, like \(εabc\), and we want to conclude \(P(εabc)\). We build the string \(εabc\) by snapping together smaller strings using rules 1 and 2; we can imagine building a proof of \(P(εabc)\) by snapping together smaller proofs using proofs 1 and 2.

To show that \(εabc\) is a string, we first use rule 1 to show that \(ε\) is a string, then rule 2 to show that \(εa\) is a string (this assumes that \(ε\) is a string, but we just argued it was), and then rule 2 again to show that \(εab\) is a string (using the fact that \(εa\) is a string), and finally use rule 2 a third time to show that \(εabc\) is a string.

Similarly, we can use proof 1 to show that \(P(ε)\) holds, then use proof 2 to show that \(P(εa)\) holds (this assumes that \(P(ε)\) holds, but we just argued it does), and then use proof 2 again to show that \(P(εab)\) holds (using the fact that \(P(εa)\) holds), and finally use proof 2 a third time to show that \(P(εabc)\) holds.

In general, any element of an inductively defined set is built up by applying the rules defining the set, so if you provide a proof for each rule, you have given a proof for every element. Before you can build a complex structure, you have to build the parts, so while building the proof that some property holds on a complex structure, you can assume that you have already proved it for the subparts.

Structural induction step by step

In general, if an inductive set \(X\) is defined by a set of rules (rule 1, rule 2, etc.), then we can prove \(∀x \in X, P(X)\) by giving a separate proof of \(P(x)\) for \(x\) formed by each of the rules. In the cases where the rule recursively uses elements \(y_1, y_2, \dots\) of the set being defined, we can assume \(P(y_1), P(y_2), \dots\).

Example structures:

Example proofs

lengths of strings are nonnegative

Recall \(Σ^*\) is defined by \(x \in Σ^* ::= ε \mid xa\) and \(len : Σ^* → \N\) is given by \(len(ε) ::= 0\) and \(len(xa) ::= 1 + len(x)\).

Claim: For all \(x \in Σ^*\), \(len(x) \geq 0\) Proof: By induction on the structure of \(x\). Let \(P(x)\) be the statement "\(len(x) \geq 0\)". We must prove \(P(ε)\), and \(P(xa)\) assuming \(P(x)\).

\(P(ε)\) case: we want to show \(len(ε) \geq 0\). Well, by definition, \(P(ε) = 0 \geq 0\).

\(P(xa)\) case: assume \(P(x)\). That is, \(len(x) \geq 0\). We wish to show \(P(xa)\), i.e. that \(len(xa) \geq 0\). Well, \(len(xa) = 1 + len(x) \geq 1 + 0 = 1\).

balanced trees of height \(k\) have height \(2^k - 1\)

Here is another example proof by structural induction, this time using the definition of trees. We proved this in lecture 21 but it has been moved here.

Definition: We say that a tree \(t \in T\) is balanced of height \(k\) if either 1. \(t = nil\) and \(k = 0\), or 2. \(t = node(a,t_1,t_2)\) and \(t_1\) and \(t_2\) are both balanced of height \(k-1\).

Definition: We define \(n : T → \mathbb{N}\) by the rules \(n(nil) := 0\) and \(n(node(a,t_1,t_2)) := 1 + n(t_1) + n(t_2)\).

Claim: for all \(t \in T\) and for all \(k \in \mathbb{N}\), If \(t\) is balanced of height \(k\) then \(n(t) = 2^{k}-1\).

Proof: By structural induction on \(t\). Let \(P(t)\) be the statement "for all \(k \in \mathbb{N}\), if \(t\) is balanced of height \(k\), then \(n(t) = 2^{k}-1\)." We must show \(P(nil)\) and \(P(node(a,t_1,t_2))\).

We start by proving \(P(nil)\), i.e. that for all \(k\), if \(nil\) is balanced of height \(k\) then \(n(nil) = 2^k-1\). Well, the only way for \(nil\) to be balanced of height \(k\) is if \(k = 0\). Therefore \(2^k - 1 = 2^0 - 1 = 0\). The definition of \(n\) shows that \(n(nil)\) is also 0, so \(n(nil) = 2^k-1\) in this case.

For the \(node\) case, we must show that if \(node(a,t_1,t_2)\) is balanced of height \(k\) for some \(k\), then \(n(node(a,t_1,t_2)) = 2^k-1\). We get to assume the inductive hypotheses: \(P(t_1)\) says that if \(t_1\) is balanced of height \(k'\) for some \(k'\) then \(n(t_1) = 2^{k'}-1\), and similarly for \(t_2\).

Since \(node(a,t_1,t_2)\) is balanced of height \(k\), we know that \(t_1\) and \(t_2\) must both be balanced of height \(k-1\) (this is the definition of balanced of height \(k\)). Therefore, by \(P(t_1)\) we see that \(n(t_1) = 2^{k-1}-1\), and \(n(t_2) = 2^{k-1}-1\). Therefore, by definition of \(n\), we see

\[n(node(a,t_1,t_2)) = 1 + n(t_1) + n(t_2) = 1 + (2^{k-1}-1) + (2^{k-1}-1) = 2^k\]

as required.