Reading: MCS 7,7.1

- Inductively defined sets
- BNF notation
- examples: lists, trees, \(\mathbb{N}\), logical formulae
- Strings (\(Σ^*\))

- Inductively defined functions
- examples: length, concatenation

Proofs by structural induction

**Review Exercises**:- Give inductive definitions for the following sets: \(\mathbb{N}\); the set of strings with alphabet \(Σ\); the set of binary trees; the set of arithmetic expressions formed using addition, multiplication, exponentiation
Give inductive definitions of the length of a string, the concatenation of two strings, the reverse of a string, the maximum element of a list of integers, the sum of two natural numbers, the product of two natural numbers, etc.

Prove that \(len(cat(x,y)) = len(x) + len(y)\).

Prove that \(len(reverse(x)) = len(x)\).

Use the inductive definitions of \(\mathbb{N}\) and \(plus\) to show that \(plus(a,b) = plus(b,a)\).

An inductively defined set is a set where the elements are constructed by a finite number of applications of a given set of rules.

Examples:

- the set \(\mathbb{N}\) of natural numbers is the set of elements defined by the following rules:
- \(Z \in \mathbb{N}\)
- If \(n \in \mathbb{N}\) then \(Sn \in \mathbb{N}\).

thus the elements of \(\mathbb{N}\) are \(\{Z, SZ, SSZ, SSSZ, \dots\}\). \(S\) stands for successor. You can then define \(1\) as \(SZ\), \(2\) as \(SSZ\), and so on.

- the set \(\Sigma^{*}\) of strings with characters in \(\Sigma\) is defined by
- \(\epsilon \in \Sigma^*\)
- If \(a \in \Sigma\) and \(x \in \Sigma^{*}\) then \(xa \in \Sigma^*\).

thus if \(\Sigma = \{0,1\}\), then the elements of \(\Sigma^*\) are \(\{ε, ε0, ε1, ε00, ε01, \dots, ε1010101, \dots\}\). we usually leave off the \(ε\) at the beginning of strings of length 1 or more.

- the set \(T\) of binary trees with integers in the nodes is given by the rules
- the empty tree (, written \(nil\)) is a tree
- if \(t_1\) and \(t_2\) are trees, then , written \(node(a,t_1,t_2)\)) is a tree.

thus the elements of \(T\) are things like the picture to the right (click for tex), which might be written textually as \(node(3,node(0,nil,nil),node(1,node(2,nil,nil),nil))\)

Compact way of writing down inductively defined sets: BNF (Backus Naur Form)

Only the name of the set and the rules are written down; they are separated by a "::=", and the rules are separated by vertical bar (\(|\)).

Examples (from above):

\(n \in \mathbb{N} ::= 0 \mid Sn\)

\(x \in \Sigma^* ::= \epsilon \mid xa\) where \(a \in \Sigma\)

\(t \in T ::= nil \mid node(a,t_1,t_2)\) where \(a \in Z\)

(basic mathematical expresssions) \[\begin{aligned}e \in E &::= n \mid e_1 + e_2 \mid e_1 * e_2 \mid - e \mid e_1 / e_2 \\ n \in \mathbb{Z}\end{aligned}\]

Here, the variables to the left of the \(\in\) indicate *metavariables*. When the same characters appear in the rules on the right-hand side of the \(::=\), they indicate an arbitrary element of the set being defined. For example, the \(e_1\) and \(e_2\) in the \(e_1 + e_2\) rule could be arbitrary elements of the set \(E\), but \(+\) is just the symbol \(+\).

If \(X\) is an inductively defined set, you can define a function from \(X\) to \(Y\) by defining the function on each of the types of elements of \(X\); i.e. for each of the rules. In the inductive rules (i.e. the ones containing the metavariable being defined), you can assume the function is already defined on the subterms.

Examples:

\(add2 : \mathbb{N} → \mathbb{N}\) is given by \(add2(0) ::= SS0\) and \(add2 (Sn) ::= S(add2(n))\).

\(plus : \mathbb{N} \times \mathbb{N} → \mathbb{N}\) given by \(plus (0,n) ::= n\) and \(plus (Sn, n') ::= S(plus(n,n'))\). Note that we don't need to use induction on both of the inputs.

\(len : Σ^* → \mathbb{N}\) is given by \(len(ε) ::= 0\) and \(len(xa) ::= 1 + len(x)\).

\(cat : Σ^* \times Σ^* → Σ^*\) is given by \(cat(ε,ε) ::= ε\), \(cat(xa,ε) ::= xa\), \(cat(ε,xa) ::= xa\) and \(cat(xa,yb) ::= cat(xa,y)b\).

Consider the definition \(x \in Σ^* ::= ε \mid xa\). I will refer to \(x ::= ε\) as "rule 1" and \(x ::= xa\) as "rule 2". This definition says that there are two kinds of strings: empty strings (formed using rule 1), and strings of the form \(xa\), where \(x\) is a smaller string (formed using rule 2); these are the only kinds of strings.

If we want to prove that property \(P\) holds on all strings (i.e. \(∀x \in Σ^*, P(x)\)), we can do it by giving a proof for strings formed using rule 1 (let's call it proof 1), and another proof for strings formed using rule 2 (let's call it proof 2). In the second proof, we may assume that \(P(y)\) holds.

Why can we make this assumption? Suppose we have some complicated string, like \(εabc\), and we want to conclude \(P(εabc)\). We build the string \(εabc\) by snapping together smaller strings using rules 1 and 2; we can imagine building a proof of \(P(εabc)\) by snapping together smaller proofs using proofs 1 and 2.

To show that \(εabc\) is a string, we first use rule 1 to show that \(ε\) is a string, then rule 2 to show that \(εa\) is a string (this assumes that \(ε\) is a string, but we just argued it was), and then rule 2 again to show that \(εab\) is a string (using the fact that \(εa\) is a string), and finally use rule 2 a third time to show that \(εabc\) is a string.

Similarly, we can use proof 1 to show that \(P(ε)\) holds, then use proof 2 to show that \(P(εa)\) holds (this assumes that \(P(ε)\) holds, but we just argued it does), and then use proof 2 again to show that \(P(εab)\) holds (using the fact that \(P(εa)\) holds), and finally use proof 2 a third time to show that \(P(εabc)\) holds.

In general, any element of an inductively defined set is built up by applying the rules defining the set, so if you provide a proof for each rule, you have given a proof for every element. Before you can build a complex structure, you have to build the parts, so while building the proof that some property holds on a complex structure, you can assume that you have already proved it for the subparts.

In general, if an inductive set \(X\) is defined by a set of rules (rule 1, rule 2, etc.), then we can prove \(∀x \in X, P(X)\) by giving a separate proof of \(P(x)\) for \(x\) formed by each of the rules. In the cases where the rule recursively uses elements \(y_1, y_2, \dots\) of the set being defined, we can assume \(P(y_1), P(y_2), \dots\).

**Example structures:**

\(Σ^*\) is defined by \(x ∈ Σ^* ::= ε \mid xa\). To prove \(∀x \in Σ^*, P(x)\), you must prove (1) \(P(ε)\), and (2) \(P(xa)\); but in the proof of (2) you may assume \(P(x)\).

If a set \(T\) is defined by \(t \in T ::= nil \mid node(a,t_1,t_2)\), you must prove (1) \(P(nil)\) and (2) \(P(node(a,t_1,t_2))\). But, in the proof of (2) you may assume \(P(t_1)\) and \(P(t_2)\).

If a set \(F\) is defined by \(φ \in F ::= Q \mid \lnot φ \mid φ_1 \land φ_2 \mid φ_1 \lor φ_2\), you can prove \(∀φ ∈ F, P(φ)\) by proving (1) \(P(Q)\), (2) \(P(\lnot φ)\) [assuming \(P(φ)\)], (3) \(P(φ_1 \land φ_2)\) [assuming \(P(φ_1)\) and \(P(φ_2)\)], (4) \(P(φ_1 \lor φ_2)\) [assuming \(P(φ_1)\) and \(P(φ_2)\)].

Recall \(Σ^*\) is defined by \(x \in Σ^* ::= ε \mid xa\) and \(len : Σ^* → \N\) is given by \(len(ε) ::= 0\) and \(len(xa) ::= 1 + len(x)\).

**Claim:** For all \(x \in Σ^*\), \(len(x) \geq 0\) **Proof:** By induction on the structure of \(x\). Let \(P(x)\) be the statement "\(len(x) \geq 0\)". We must prove \(P(ε)\), and \(P(xa)\) assuming \(P(x)\).

\(P(ε)\) case: we want to show \(len(ε) \geq 0\). Well, by definition, \(P(ε) = 0 \geq 0\).

\(P(xa)\) case: assume \(P(x)\). That is, \(len(x) \geq 0\). We wish to show \(P(xa)\), i.e. that \(len(xa) \geq 0\). Well, \(len(xa) = 1 + len(x) \geq 1 + 0 = 1\).

Here is another example proof by structural induction, this time using the definition of trees. We proved this in lecture 21 but it has been moved here.

**Definition:** We say that a tree \(t \in T\) is *balanced of height \(k\)* if either 1. \(t = nil\) and \(k = 0\), or 2. \(t = node(a,t_1,t_2)\) and \(t_1\) and \(t_2\) are both balanced of height \(k-1\).

**Definition:** We define \(n : T → \mathbb{N}\) by the rules \(n(nil) := 0\) and \(n(node(a,t_1,t_2)) := 1 + n(t_1) + n(t_2)\).

**Claim:** for all \(t \in T\) and for all \(k \in \mathbb{N}\), If \(t\) is balanced of height \(k\) then \(n(t) = 2^{k}-1\).

**Proof:** By structural induction on \(t\). Let \(P(t)\) be the statement "for all \(k \in \mathbb{N}\), if \(t\) is balanced of height \(k\), then \(n(t) = 2^{k}-1\)." We must show \(P(nil)\) and \(P(node(a,t_1,t_2))\).

We start by proving \(P(nil)\), i.e. that for all \(k\), if \(nil\) is balanced of height \(k\) then \(n(nil) = 2^k-1\). Well, the only way for \(nil\) to be balanced of height \(k\) is if \(k = 0\). Therefore \(2^k - 1 = 2^0 - 1 = 0\). The definition of \(n\) shows that \(n(nil)\) is also 0, so \(n(nil) = 2^k-1\) in this case.

For the \(node\) case, we must show that if \(node(a,t_1,t_2)\) is balanced of height \(k\) for some \(k\), then \(n(node(a,t_1,t_2)) = 2^k-1\). We get to assume the inductive hypotheses: \(P(t_1)\) says that if \(t_1\) is balanced of height \(k'\) for some \(k'\) then \(n(t_1) = 2^{k'}-1\), and similarly for \(t_2\).

Since \(node(a,t_1,t_2)\) is balanced of height \(k\), we know that \(t_1\) and \(t_2\) must both be balanced of height \(k-1\) (this is the definition of balanced of height \(k\)). Therefore, by \(P(t_1)\) we see that \(n(t_1) = 2^{k-1}-1\), and \(n(t_2) = 2^{k-1}-1\). Therefore, by definition of \(n\), we see

\[n(node(a,t_1,t_2)) = 1 + n(t_1) + n(t_2) = 1 + (2^{k-1}-1) + (2^{k-1}-1) = 2^k\]

as required.