SearchTreeBinary Search Trees

Binary search trees are an efficient data structure for lookup tables, that is, mappings from keys to values. The total_map type from Maps.v is an inefficient implementation: if you add N items to your total_map, then looking them up takes N comparisons in the worst case, and N/2 comparisons in the average case.

In contrast, if your key type is a total order — that is, if it has a less-than comparison that's transitive and antisymmetric a<b ↔ ~(b<a) — then one can implement binary search trees (BSTs). We will assume you know how BSTs work; you can learn this from:

Section 3.2 of Algorithms, Fourth Edition, by Sedgewick and Wayne, Addison Wesley 2011; or
Chapter 12 of Introduction to Algorithms, 3rd Edition, by Cormen, Leiserson, and Rivest, MIT Press 2009.

Our focus here is to prove the correctness of an implementation of binary search trees.

From Coq Require Import Strings.String.
From VFA Require Import Perm.
Require Import FunctionalExtensionality.

Total and Partial Maps

Recall the Maps chapter of Volume 1 (Logical Foundations), describing functions from identifiers to some arbitrary type A. VFA's Maps module is almost exactly the same, except that it implements functions from nat to some arbitrary type A.

From VFA Require Import Maps.

Sections

We will use Coq's Section feature to structure this development, so first a brief introduction to Sections. We'll use the example of lookup tables implemented by lists.

Module SectionExample1.
  Definition mymap (V: Type) := list (nat*V).
  Definition empty (V: Type) : mymap V := nil.
  Fixpoint lookup (V: Type) (default: V) (x: nat) (m: mymap V) : V :=
    match m with
    | (a,v)::al ⇒ if x =? a then v else lookup V default x al
    | nil ⇒ default
    end.
  Theorem lookup_empty (V: Type) (default: V):
       ∀x, lookup V default x (empty V) = default.
   Proof. reflexivity. Qed.
End SectionExample1.

It sure is tedious to repeat the V and default parameters in every definition and every theorem. The Section feature allows us to declare them as parameters to every definition and theorem in the entire section:

Module SectionExample2.
  Section MAPS.
  Variable V : Type.
  Variable default: V.

  Definition mymap := list (nat*V).
  Definition empty : mymap := nil.
  Fixpoint lookup (x: nat) (m: mymap) : V :=
    match m with
    | (a,v)::al ⇒ if x =? a then v else lookup x al
    | nil ⇒ default
    end.
  Theorem lookup_empty:
       ∀x, lookup x empty = default.
   Proof. reflexivity. Qed.
  End MAPS.
End SectionExample2.

At the close of the section, this produces exactly the same result: the functions that need to be parametrized by V or default have been generalized to take them as parameters:

Check SectionExample2.empty.
(* ====>
SectionExample2.empty
: forall V : Type, SectionExample2.mymap V
*)

Check SectionExample2.lookup.
(* ====>
SectionExample2.lookup
: forall V : Type, V -> nat -> SectionExample2.mymap V -> V
*)

We can even prove that the two definitions are equivalent, as follows.

Theorem empty_equiv : SectionExample1.empty = SectionExample2.empty.
Proof. reflexivity. Qed.

For lookup, the proof is a bit harder. The problem is that the two lookup functions are not implemented in quite the same way:

Print SectionExample1.lookup.
(* ====>
SectionExample1.lookup =
fix lookup (V : Type) (default : V) (x : nat) (m : SectionExample1.mymap V)
{struct m} : V := ...
*)

Print SectionExample2.lookup.
(* ====>
SectionExample2.lookup =
fun (V : Type) (default : V) =>
fix lookup (x : nat) (m : SectionExample2.mymap V) {struct m} : V := ...
*)

When the Section added parameters to SectionExample2.lookup, they came as a result of function fun (V : Type) (default : V) ... wrapped around the actual lookup function. So, reflexivity isn't enough to complete the proof.

Theorem lookup_equiv : SectionExample1.lookup = SectionExample2.lookup.
Proof.
try reflexivity. (* doesn't do anything. *)

Instead, we need to prove that the two lookup functions compute the same result when applied to the same arguments. That is, we want to use the Axiom of Functional Extensionality, which is discussed in Logic and provided by the standard library's Logic.FunctionalExentionality module. Recall that functions f and g are extensionally equal if, for every argument x, it holds that f x = g x. The Axiom of Functional Extensionality says that if two functions are extensionally equal, then they are equal. The extensionality tactic, which we use next, is a convenient way of applying the axiom.

  extensionality V. extensionality default. extensionality x.
  extensionality m. induction m as [ | h t IH]; simpl.
  - reflexivity.
  - destruct h as [k v]. destruct (x =? k).
    + reflexivity.
    + apply IH.
Qed.

Binary Search Tree Implementation

Section TREES.
Variable V : Type.
Variable default: V.

Definition key := nat.

Inductive tree : Type :=
| E : tree
| T : tree → key → V → tree → tree.

Definition empty_tree : tree := E.

Fixpoint lookup (x: key) (t : tree) : V :=
  match t with
  | E ⇒ default
  | T tl k v tr ⇒ if x <? k then lookup x tl
                         else if k <? x then lookup x tr
                         else v
  end.

Fixpoint insert (x: key) (v: V) (s: tree) : tree :=
match s with
| E ⇒ T E x v E
| T a y v' b ⇒ if x <? y then T (insert x v a) y v' b
else if y <? x then T a y v' (insert x v b)
else T a x v b
end.

Fixpoint elements' (s: tree) (base: list (key*V)) : list (key * V) :=
match s with
| E ⇒ base
| T a k v b ⇒ elements' a ((k,v) :: elements' b base)
end.

Definition elements (s: tree) : list (key * V) := elements' s nil.

If you're wondering why we didn't implement elements more simply with ++, we'll return to that question below when we discuss a function named slow_elements; feel free to peek ahead now if you're curious.

Search Tree Examples

Section EXAMPLES.
Variables v₂ v₄ v₅ : V.
Compute insert 5 v₅ (insert 2 v₂ (insert 4 v₅ empty_tree)).
  (*      = T (T E 2 v₂ E) 4 v₅ (T E 5 v₅ E) *)
Compute lookup 5 (T (T E 2 v₂ E) 4 v₅ (T E 5 v₅ E)).
  (*      = v₅ *)
Compute lookup 3 (T (T E 2 v₂ E) 4 v₅ (T E 5 v₅ E)).
  (*      = default *)
Compute elements (T (T E 2 v₂ E) 4 v₅ (T E 5 v₅ E)).
  (*      = (2, v₂); (4, v₅); (5, v₅) *)
End EXAMPLES.

Efficiency of BSTs

We use binary search trees because they are efficient. That is, if there are N elements in a (reasonably well balanced) BST, each insertion or lookup takes about log N time.

What could go wrong?

1. The search tree might not be balanced. In that case, each insertion or lookup will take as much as linear time. - SOLUTION: use an algorithm, such as "red-black trees", that ensures the trees stay balanced. We'll do that in Chapter RedBlack.

2. Our keys are natural numbers, and Coq's nat type takes linear time per comparison. That is, computing (j <? k) takes time proportional to the value of k-j. - SOLUTION: represent keys by a data type that has a more efficient comparison operator. We just use nat in this chapter because it's something you're already familiar with.

3. We have no formalization of running time or time complexity in Coq. That is, we can't say what it means that a Coq function "takes N steps to evaluate." Therefore, we can't prove that binary search trees are efficient. - SOLUTION 1: Don't prove (in Coq) that they're efficient; just prove that they are correct. Prove things about their efficiency the old-fashioned way, on pencil and paper. - SOLUTION 2: Prove in Coq some facts about the height of the trees, which have direct bearing on their efficiency. We'll explore that in later chapters.

What Should We Prove About Search trees?

Search trees are meant to be an implementation of maps. That is, they have an insert function that corresponds to the update function of a map, and a lookup function that corresponds to applying the map to an argument. To prove the correctness of a search-tree algorithm, we can prove:

Any search tree corresponds to some map, using a function or relation that we demonstrate.
The lookup function gives the same result as applying the map.
The insert function returns a corresponding map.
Maps have the properties we actually want.

What properties do we want search trees to have? If I insert the binding (k,v) into a search tree t, then look up k, I should get v. If I look up k' in insert (k,v) t, where k'≠k, then I should get the same result as lookup k t. There are several more such properties, which are already proved about total_map in the Maps module:

Check t_update_eq.
(* : forall (A : Type) (m : total_map A) (x : nat) (v : A),
t_update m x v x = v *)

Check t_update_neq.
(* : forall (X : Type) (v : X) (x₁ x₂ : nat) (m : total_map X),
x₁ <> x₂ -> t_update m x₁ v x₂ = m x₂ *)

Check t_update_shadow.
(* : forall (A : Type) (m : total_map A) (v₁ v₂ : A) (x : nat),
t_update (t_update m x v₁) x v₂ = t_update m x v₂ *)

Check t_update_same.
(* : forall (X : Type) (x : nat) (m : total_map X),
t_update m x (m x) = m *)

Check t_update_permute.
(* forall (X : Type) (v₁ v₂ : X) (x₁ x₂ : nat) (m : total_map X),
       x₂ <> x₁ ->
       t_update (t_update m x₂ v₂) x₁ v₁ =
         t_update (t_update m x₁ v₁) x₂ v₂    *)

Check t_apply_empty.
(* : forall (A : Type) (x : nat) (v : A),
t_empty v x = v *)

So, if we like those properties that total_map is proved to have, and we can prove that search trees behave like maps, then we don't have to reprove each individual property about search trees.

More generally: a job worth doing is worth doing well. It does no good to prove the "correctness" of a program, if you prove that it satisfies a wrong or useless specification.

Abstraction Relation

We claim that a tree "corresponds" to a total_map. So we must exhibit an abstraction relation Abs: tree → total_map V → Prop.

The idea is that Abs t m says that tree t is a representation of map m; or that map m is an abstraction of tree t. How should we define this abstraction relation?

The empty tree is easy: Abs E (fun x ⇒ default).

Now, what about this tree?:

Definition example_tree (v₂ v₄ v₅ : V) :=
T (T E 2 v₂ E) 4 v₄ (T E 5 v₅ E).

Exercise: 2 stars, standard (example_map)

Fill in the definition of example_map with a total_map that you think example_tree should correspond to. Use t_update and (t_empty default). When you are finished, the unit tests below should all be provable with reflexivity.

Definition example_map (v₂ v₄ v₅ : V) : total_map V
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Section ExampleMap.
Variables v₂ v₄ v₅ : V.

Example example_map_2 : example_map v₂ v₄ v₅ 2 = v₂.
Proof. (* FILL IN HERE *) Admitted.

Example example_map_4 : example_map v₂ v₄ v₅ 4 = v₄.
Proof. (* FILL IN HERE *) Admitted.

Example example_map_5 : example_map v₂ v₄ v₅ 5 = v₅.
Proof. (* FILL IN HERE *) Admitted.

Example example_map_default : example_map v₂ v₄ v₅ 0 = default.
Proof. (* FILL IN HERE *) Admitted.

End ExampleMap.

☐

To build the Abs relation, we'll use an auxiliary function that combines maps. combine pivot a b uses the map a on any input less than pivot, and map b on any input greater than (or equal to) pivot.

Definition combine {A} (pivot: key) (m₁ m₂: total_map A) : total_map A :=
fun x ⇒ if x <? pivot then m₁ x else m₂ x.

Inductive Abs: tree → total_map V → Prop :=
| Abs_E: Abs E (t_empty default)
| Abs_T: ∀a b l k v r,
      Abs l a →
      Abs r b →
      Abs (T l k v r) (t_update (combine k a b) k v).

Exercise: 3 stars, standard (check_example_map)

Prove that your example_map is the right one. Start by proving the following auxiliary lemma that will make Abs_T easier to work with.

Lemma Abs_T_aux : ∀a b l k v r m,
    Abs l a → Abs r b
    → m = t_update (combine k a b) k v
    → Abs (T l k v r) m.
Proof. (* FILL IN HERE *) Admitted.

Now proceed with the proof about your example_map. Hint: to prove the equality of two non-empty maps, use functional extensionality to make the argument to the map (that is, a key) manifest. Reduce both maps to expressions involving just functions, if expressions, and Boolean comparisons. Then you can repeatedly destruct the key argument and simplify.

Lemma check_example_map:
  ∀v₂ v₄ v₅, Abs (example_tree v₂ v₄ v₅) (example_map v₂ v₄ v₅).
Proof.
  intros v₂ v₄ v₅. unfold example_tree.
  eapply Abs_T_aux. (* FILL IN HERE *) Admitted.

☐

Proof of Correctness

Next we'll prove the correctness of empty_tree, lookup, and insert. In each case we want to show that the tree-based and map-based definitions are equivalent.

Exercise: 1 star, standard (empty_tree_relate)

Prove that the empty tree abstracts to the empty map.

Theorem empty_tree_relate: Abs empty_tree (t_empty default).
Proof. (* FILL IN HERE *) Admitted.

☐

Exercise: 3 stars, standard (lookup_relate)

Prove that if tree t abstracts to map m, then looking up a key in t yields the same value as looking up that key in m. In other words, the following diagram commutes:

       lookup k
    t ----------+
    |            \
Abs |             +--> v
    V            /
    m ----------+
       lookup k

Theorem lookup_relate:
∀k t m ,
Abs t m → lookup k t = m k.
Proof.
(* FILL IN HERE *) Admitted.

☐

Exercise: 4 stars, standard (insert_relate)

Finally, prove that if tree t abstracts to map m, then inserting a new binding into t and abstracting yields the same map as updating m with the new binding. In other words, the following diagram commutes:

        insert k v
    t --------------> t'
    |                 |
Abs |                 | Abs
    V                 V
    m --------------> m'
        update k v

Theorem insert_relate:
∀k v t m,
Abs t m →
Abs (insert k v t) (t_update m k v).
Proof.
(* FILL IN HERE *) Admitted.

☐

Correctness Proof of the elements Function

THE REST OF THIS CHAPTER IS OPTIONAL.

How should we specify what elements is supposed to do? Well, elements t returns a list of pairs (k₁,v₁);(k₂;v₂);...;(kn,vn) that ought to correspond to the total_map, t_update ... (t_update (t_update (t_empty default) k₁ v₁) k₂ v₂) ... kn vn.

We can formalize this quite easily.

Fixpoint list2map (el: list (key*V)) : total_map V :=
match el with
| nil ⇒ t_empty default
| (i,v)::el' ⇒ t_update (list2map el') i v
end.

Exercise: 3 stars, standard, optional (elements_relate_informal)

Theorem elements_relate:
∀t m, Abs t m → list2map (elements t) = m.
Proof.

Don't prove this yet. Instead, explain in your own words, with examples, why this ought to be true. It's OK if your explanation is not a formal proof; it's even OK if your explanation is subtly wrong! Just make it convincing.

(* FILL IN YOUR EXPLANATION HERE *)
Abort.

☐

Now, formally prove that elements_relate is in fact false, as long as type V contains at least two distinct values.

Exercise: 4 stars, standard, optional (not_elements_relate)

Theorem not_elements_relate:
  ∀v, v ≠ default →
  ¬(∀t m, Abs t m → list2map (elements t) = m).
Proof.
intros.
intro.
pose (bogus := T (T E 3 v E) 2 v E).
pose (m := t_update (t_empty default) 2 v).
pose (m' := t_update
        (combine 2
           (t_update (combine 3 (t_empty default) (t_empty default)) 3 v)
           (t_empty default)) 2 v).
assert (Paradox: list2map (elements bogus) = m ∧ list2map (elements bogus) ≠ m).
split.

To prove the first subgoal, prove that m=m' (by extensionality) and then use H.

To prove the second subgoal, do an intro so that you can assume update_list (t_empty default) (elements bogus) = m, then show that update_list (t_empty default) (elements bogus) 3 ≠ m 3. That's a contradiction.

To prove the third subgoal, just destruct Paradox and use the contradiction.

In all 3 goals, when you need to unfold local definitions such as bogus you can use unfold bogus or subst bogus.

(* FILL IN HERE *) Admitted.

☐

What went wrong? Clearly, elements_relate is true; you just explained why. And clearly, it's not true, because not_elements_relate is provable in Coq. The problem is that the tree (T (T E 3 v E) 2 v E) is bogus: it's not a well-formed binary search tree, because there's a 3 in the left subtree of the 2 node, and 3 is not less than 2.

If you wrote a good answer to the elements_relate_informal exercise, (that is, an answer that is only subtly wrong), then the subtlety is that you assumed that the search tree is well formed. That's a reasonable assumption; but we will have to prove that all the trees we operate on will be well formed.

Representation Invariants

A tree has the SearchTree property if, at any node with key k, all the keys in the left subtree are less than k, and all the keys in the right subtree are greater than k. It's not completely obvious how to formalize that! Here's one way: it's correct, but not very practical.

Fixpoint forall_nodes (t: tree) (P: tree→key→V→tree→Prop) : Prop :=
match t with
| E ⇒ True
| T l k v r ⇒ P l k v r ∧ forall_nodes l P ∧ forall_nodes r P
end.

Definition SearchTreeX (t: tree) :=
forall_nodes t
   (fun l k v r ⇒
      forall_nodes l (fun _ j _ _ ⇒ j<k) ∧
      forall_nodes r (fun _ j _ _ ⇒ j>k)).

Lemma example_SearchTree_good:
∀v₂ v₄ v₅, SearchTreeX (example_tree v₂ v₄ v₅).
Proof.
intros. compute. auto.
Qed.

Lemma example_SearchTree_bad:
∀v, ¬SearchTreeX (T (T E 3 v E) 2 v E).
Proof.
intros. compute. omega.
Qed.

Theorem elements_relate_second_attempt:
  ∀t m,
  SearchTreeX t →
  Abs t m →
  list2map (elements t) = m.
Proof.

This is probably provable. But the SearchTreeX property is quite unwieldy, with its two Fixpoints nested inside a Fixpoint.

Abort.

Instead of using SearchTreeX, let's reformulate the search tree property as an inductive proposition without any nested induction. SearchTree' lo t hi will hold of a tree t satisfying the BST invariant whose keys are between lo (inclusive) and hi (exclusive), where lo may not be greater than hi.

Inductive SearchTree' : key → tree → key → Prop :=
| ST_E : ∀lo hi, lo ≤ hi → SearchTree' lo E hi
| ST_T: ∀lo l k v r hi,
    SearchTree' lo l k →
    SearchTree' (S k) r hi →
    SearchTree' lo (T l k v r) hi.

It's easy to show that the definition prohibits lo from being greater than hi.

Lemma SearchTree'_le:
∀lo t hi, SearchTree' lo t hi → lo ≤ hi.
Proof.
intros. induction H.
- assumption.
- omega.
Qed.

To be a SearchTree, a tree must satisfy SearchTree' for some upper bound hi.

Inductive SearchTree: tree → Prop :=
| ST_intro: ∀t hi, SearchTree' 0 t hi → SearchTree t.

Before we prove that elements is correct, let's consider a simpler version.

Fixpoint slow_elements (s: tree) : list (key * V) :=
match s with
| E ⇒ nil
| T a k v b ⇒ slow_elements a ++ [(k,v)] ++ slow_elements b
end.

This one is easier to understand than the elements function, because it doesn't carry the base list around in its recursion. Unfortunately, its running time is quadratic, because at each of the T nodes it does a linear-time list concatentation. The original elements function takes linear time overall; that's much more efficient.

To prove correctness of elements, it's actually easier to first prove that it's equivalent to slow_elements, then prove the correctness of slow_elements. We don't care that slow_elements is quadratic, because we're never going to really run it; it's just there to support the proof.

Exercise: 3 stars, standard, optional (elements_slow_elements)

Theorem elements_slow_elements: elements = slow_elements.
Proof.
extensionality s.
unfold elements.
assert (∀base, elements' s base = slow_elements s ++ base).
(* FILL IN HERE *) Admitted.

☐

Exercise: 3 stars, standard, optional (slow_elements_range)

Lemma slow_elements_range:
∀k v lo t hi,
  SearchTree' lo t hi →
  In (k,v) (slow_elements t) →
  lo ≤ k < hi.
Proof.
(* FILL IN HERE *) Admitted.

☐

Auxiliary Lemmas About In and list2map

Lemma In_decidable:
∀(al: list (key*V)) (i: key),
(∃v, In (i,v) al) ∨ (¬∃v, In (i,v) al).

Proof.
intros.
induction al as [ | [k v]].
right; intros [w H]; inv H.
destruct IHal as [[w H] | H].
left; ∃w; right; auto.
bdestruct (k =? i).
subst k.
left; eauto.
∃v; left; auto.
right. intros [w H₁].
destruct H₁. inv H₁. omega.
apply H; eauto.
Qed.

Lemma list2map_app_left:
∀(al bl: list (key*V)) (i: key) v,
In (i,v) al → list2map (al++bl) i = list2map al i.

Proof.
intros.
revert H; induction al as [| [j w] al]; intro; simpl; auto.
inv H.
destruct H. inv H.
unfold t_update.
bdestruct (i=?i); [ | omega].
auto.
unfold t_update.
bdestruct (j=?i); auto.
Qed.

Lemma list2map_app_right:
∀(al bl: list (key*V)) (i: key),
~(∃v, In (i,v) al) → list2map (al++bl) i = list2map bl i.

Proof.
intros.
revert H; induction al as [| [j w] al]; intro; simpl; auto.
unfold t_update.
bdestruct (j=?i).
subst j.
contradiction H.
∃w; left; auto.
apply IHal.
contradict H.
destruct H as [u ?].
∃u; right; auto.
Qed.

Lemma list2map_not_in_default:
∀(al: list (key*V)) (i: key),
~(∃v, In (i,v) al) → list2map al i = default.

Proof.
intros.
induction al as [| [j w] al].
reflexivity.
simpl.
unfold t_update.
bdestruct (j=?i).
subst.
contradiction H.
∃w; left; auto.
apply IHal.
intros [v ?].
apply H. ∃v; right; auto.
Qed.

Exercise: 3 stars, standard, optional (elements_relate)

Theorem elements_relate:
  ∀t m,
  SearchTree t →
  Abs t m →
  list2map (elements t) = m.
Proof.
rewrite elements_slow_elements.
intros until 1. inv H.
revert m; induction H₀; intros.
* (* ST_E case *)
inv H₀.
reflexivity.
* (* ST_T case *)
inv H.
specialize (IHSearchTree'1 _ H₅). clear H₅.
specialize (IHSearchTree'2 _ H₆). clear H₆.
unfold slow_elements; fold slow_elements.
subst.
extensionality i.
destruct (In_decidable (slow_elements l) i) as [[w H] | Hleft].
rewrite list2map_app_left with (v:=w); auto.
pose proof (slow_elements_range _ _ _ _ _ H0_ H).
unfold combine, t_update.
bdestruct (k=?i); [ omega | ].
bdestruct (i<?k); [ | omega].
auto.
(* FILL IN HERE *) Admitted.

☐

Preservation of Representation Invariant

How do we know that all the trees we will encounter (particularly, that the elements function will encounter), have the SearchTree property? Well, the empty tree is a SearchTree; and if you insert into a tree that's a SearchTree, then the result is a SearchTree; and these are the only ways that you're supposed to build trees. So we need to prove those two theorems.

Exercise: 1 star, standard, optional (empty_tree_SearchTree)

Theorem empty_tree_SearchTree: SearchTree empty_tree.
Proof.
clear default. (* This is here to avoid a nasty interaction between Admitted
and Section/Variable. It's also a hint that the default value
is not needed in this theorem. *)
(* FILL IN HERE *) Admitted.

☐

Exercise: 3 stars, standard, optional (insert_SearchTree)

Theorem insert_SearchTree:
∀k v t,
SearchTree t → SearchTree (insert k v t).
Proof.
clear default. (* This is here to avoid a nasty interaction between Admitted and Section/Variable *)
(* FILL IN HERE *) Admitted.

☐

We Got Lucky

Recall the statement of lookup_relate:

Check lookup_relate.
(* forall (k : key) (t : tree) (m : total_map V),
Abs t m -> lookup k t = m k *)

In general, to prove that a function satisfies the abstraction relation, one also needs to use the representation invariant. That was certainly the case with elements_relate:

Check elements_relate.
(* : forall (t : tree) (m : total_map V),
SearchTree t -> Abs t m -> elements_property t m *)

To put that another way, the general form of lookup_relate should be:

Lemma lookup_relate':
∀(k : key) (t : tree) (m : total_map V),
SearchTree t → Abs t m → lookup k t = m k.

That is certainly provable, since it's a weaker statement than what we proved:

Proof.
intros.
apply lookup_relate.
apply H₀.
Qed.

Theorem insert_relate':
∀k v t m,
    SearchTree t →
    Abs t m →
    Abs (insert k v t) (t_update m k v).
Proof. intros. apply insert_relate; auto.
Qed.

The question is, why did we not need the representation invariant in the proof of lookup_relate? The answer is that our particular Abs relation is much more clever than necessary:

Print Abs.
(* Inductive Abs : tree -> total_map V -> Prop :=
    Abs_E : Abs E (t_empty default)
  | Abs_T : forall (a b: total_map V) (l: tree) (k: key) (v: V) (r: tree),
            Abs l a ->
            Abs r b ->
            Abs (T l k v r) (t_update (combine k a b) k v)
*)

Because the combine function is chosen very carefully, it turns out that this abstraction relation even works on bogus trees!

Remark abstraction_of_bogus_tree:
∀v₂ v₃,
Abs (T (T E 3 v₃ E) 2 v₂ E) (t_update (t_empty default) 2 v₂).
Proof.
intros.
evar (m: total_map V).
replace (t_update (t_empty default) 2 v₂) with m; subst m.
repeat constructor.
extensionality x.
unfold t_update, combine, t_empty.
bdestruct (2 =? x).
auto.
bdestruct (x <? 2).
bdestruct (3 =? x).
(* LOOK HERE! *)
omega.
bdestruct (x <? 3).
auto.
auto.
auto.
Qed.

Step through the proof to LOOK HERE, and notice what's going on. Just when it seems that (T (T E 3 v₃ E) 2 v₂ E) is about to produce v₃ while (t_update (t_empty default) 2 v₂) is about to produce default, omega finds a contradiction. What's happening is that combine 2 is careful to ignore any keys >= 2 in the left-hand subtree.

For that reason, Abs matches the actual behavior of lookup, even on bogus trees. But that's a really strong condition! We should not have to care about the behavior of lookup (and insert) on bogus trees. We should not need to prove anything about it, either.

Sure, it's convenient in this case that the abstraction relation is able to cope with ill-formed trees. But in general, when proving correctness of abstract-data-type (ADT) implementations, it may be a lot of extra effort to make the abstraction relation as heavy-duty as that. It's often much easier for the abstraction relation to assume that the representation is well formed. Thus, the general statement of our correctness theorems will be more like lookup_relate' than like lookup_relate.

Every Well-Formed Tree Does Actually Relate to an Abstraction

We're not quite done yet. We would like to know that every tree that satisfies the representation invariant, means something.

So as a general sanity check, we need the following theorem:

Exercise: 2 stars, standard, optional (can_relate)

Lemma can_relate:
∀t, SearchTree t → ∃m, Abs t m.
Proof.
(* FILL IN HERE *) Admitted.

☐

Now, because we happen to have a super-strong abstraction relation, that even works on bogus trees, we can prove a super-strong can_relate function:

Exercise: 2 stars, standard, optional (unrealistically_strong_can_relate)

Lemma unrealistically_strong_can_relate:
∀t, ∃m, Abs t m.
Proof.
(* FILL IN HERE *) Admitted.

☐

It Wasn't Really Luck, Actually

In the previous section, I said, "we got lucky that the abstraction relation that I happened to choose had this super-strong property."

But actually, the first time I tried to prove correctness of search trees, I did not get lucky. I chose a different abstraction relation:

Definition AbsX (t: tree) (m: total_map V) : Prop :=
list2map (elements t) = m.

It's easy to prove that elements respects this abstraction relation:

Theorem elements_relateX:
  ∀t m,
  SearchTree t →
  AbsX t m →
  list2map (elements t) = m.
Proof.
intros.
apply H₀.
Qed.

But it's not so easy to prove that lookup and insert respect this relation. For example, the following claim is not true.

Theorem naive_lookup_relateX:
∀k t m ,
AbsX t m → lookup k t = m k.
Abort. (* Not true *)

In fact, naive_lookup_relateX is provably false, as long as the type V contains at least two different values.

Theorem not_naive_lookup_relateX:
   ∀v, default ≠ v →
    ¬(∀k t m , AbsX t m → lookup k t = m k).
Proof.
unfold AbsX.
intros v H.
intros H₀.
pose (bogus := T (T E 3 v E) 2 v E).
pose (m := t_update (t_update (t_empty default) 2 v) 3 v).
assert (list2map (elements bogus) = m).
  reflexivity.
assert (¬lookup 3 bogus = m 3). {
  unfold bogus, m, t_update, t_empty.
  simpl.
  apply H.
}
(** Right here you see how it is proved.  bogus is our old friend,
    the bogus tree that does not satisfy the SearchTree property.
    m is the total_map that corresponds to the elements of bogus.
    The lookup function returns default at key 3,
    but the map m returns v at key 3.  And yet, assumption H₀
    claims that they should return the same thing. *)
apply H₂.
apply H₀.
apply H₁.
Qed.

To prove this, you'll need to use this collection of facts: In_decidable, list2map_app_left, list2map_app_right, list2map_not_in_default, slow_elements_range. The point is, it's not very pretty.

(* FILL IN HERE *) Admitted.

☐

Coherence With elements Instead of lookup

The first definition of the abstraction relation, Abs, is "coherent" with the lookup operation, but not very coherent with the elements operation. That is, Abs treats all trees, including ill-formed ones, much the way lookup does, so it was easy to prove lookup_relate. But it was harder to prove elements_relate.

The alternate abstraction relation, AbsX, is coherent with elements, but not very coherent with lookup. So proving elements_relateX is trivial, but proving lookup_relate is difficult.

This kind of thing comes up frequently. The important thing to remember is that you often have choices in formulating the abstraction relation, and the choice you make will affect the simplicity and elegance of your proofs. If you find things getting too difficult, step back and reconsider your abstraction relation.

End TREES.

Implementation

What if we want an implementation of search trees in another language, such as OCaml or Haskell? That new implementation would potentially be different than the Coq implementation we have verified. Perhaps some bugs will be introduced in the new implementation.

The solution will be to use Coq's extraction feature to derive the real implementation (in OCaml or Haskell) automatically from the Coq function. We'll explore extraction in a later chapter.

SearchTreeBinary Search Trees

Total and Partial Maps

Sections

Binary Search Tree Implementation

Search Tree Examples

Efficiency of BSTs

What Should We Prove About Search trees?

Abstraction Relation

Exercise: 2 stars, standard (example_map)

Exercise: 3 stars, standard (check_example_map)

Proof of Correctness

Exercise: 1 star, standard (empty_tree_relate)

Exercise: 3 stars, standard (lookup_relate)

Exercise: 4 stars, standard (insert_relate)

Correctness Proof of the elements Function

Exercise: 3 stars, standard, optional (elements_relate_informal)

Exercise: 4 stars, standard, optional (not_elements_relate)

Representation Invariants

Exercise: 3 stars, standard, optional (elements_slow_elements)

Exercise: 3 stars, standard, optional (slow_elements_range)

Auxiliary Lemmas About In and list2map

Exercise: 3 stars, standard, optional (elements_relate)

Preservation of Representation Invariant

Exercise: 1 star, standard, optional (empty_tree_SearchTree)

Exercise: 3 stars, standard, optional (insert_SearchTree)

We Got Lucky

Every Well-Formed Tree Does Actually Relate to an Abstraction

Exercise: 2 stars, standard, optional (can_relate)

Exercise: 2 stars, standard, optional (unrealistically_strong_can_relate)

It Wasn't Really Luck, Actually

Exercise: 4 stars, standard, optional (lookup_relateX)

Coherence With elements Instead of lookup

Implementation