(** * Redblack: Implementation and Proof of Red-Black Trees *)
(* ################################################################# *)
(** * Required Reading *)
(**
(1) General background on red-black trees:
- Section 3.3 of _Algorithms, Fourth Edition_,
by Sedgewick and Wayne, Addison Wesley 2011; or
- Chapter 13 of _Introduction to Algorithms, 3rd Edition_,
by Cormen, Leiserson, and Rivest, MIT Press 2009; or
- Wikipedia.
(2) An explanation of the particular implementation we use here.
Red-Black Trees in a Functional Setting, by Chris Okasaki.
_Journal of Functional Programming_, 9(4):471-477, July 1999.
https://www.cambridge.org/core/journals/journal-of-functional-programming/article/red-black-trees-in-a-functional-setting/62BC5EA75A2C95E3F6EE95AE3DADF0E5
(3) Optional reading:
Efficient Verified Red-Black Trees, by Andrew W. Appel, September 2011.
http://www.cs.princeton.edu/~appel/papers/redblack.pdf
*)
(* ################################################################# *)
(** * Implementation *)
(** Red-black trees are a form of binary search tree (BST), but with
_balance_. Recall that the _depth_ of a node in a tree is the
distance from the root to that node. The _height_ of a tree is the
depth of the deepest node. The [insert] or [lookup] function of
the BST algorithm (Chapter [SearchTree]) takes time
proportional to the depth of the node that is found (or inserted).
To make these functions run fast, we want trees where the
worst-case depth (or the average depth) is as small as possible.
In a perfectly balanced tree of N nodes, every node has depth less
than or or equal to log N, using logarithms base 2. In an
approximately balanced tree, every node has depth less than or
equal to 2 log N. That's good enough to make [insert] and [lookup]
run in time proportional to log N.
The trick is to mark the nodes Red and Black, and by these marks
to know when to locally rebalance the tree. For more explanation
and pictures, see the Required Reading above. *)
(** We will use the same framework as in [Extract]: keys are
OCaml integers. We don't repeat the [Extract] commands, because
they are imported implicitly. *)
From VFA Require Import Perm.
From VFA Require Import Extract.
From Coq Require Import Lists.List.
Export ListNotations.
From Coq Require Import Logic.FunctionalExtensionality.
Require Import ZArith.
Open Scope Z_scope.
Definition key := int.
Inductive color := Red | Black.
Section TREES.
Variable V : Type.
Variable default: V.
Inductive tree : Type :=
| E : tree
| T: color -> tree -> key -> V -> tree -> tree.
Definition empty_tree := E.
(** [lookup] is exactly as in our (unbalanced) search-tree algorithm
in [Extract], except that the [T] constructor carries a [color]
component, which we can ignore here. *)
Fixpoint lookup (x: key) (t : tree) : V :=
match t with
| E => default
| T _ tl k v tr => if ltb x k then lookup x tl
else if ltb k x then lookup x tr
else v
end.
(** The [balance] function is copied directly from Okasaki's paper.
Now, the nice thing about machine-checked proof in Coq is that you
can prove this correct without actually understanding it! So, do
read Okasaki's paper, but don't worry too much about the details
of this [balance] function. *)
Definition balance rb t1 k vk t2 :=
match rb with Red => T Red t1 k vk t2
| _ =>
match t1 with
| T Red (T Red a x vx b) y vy c =>
T Red (T Black a x vx b) y vy (T Black c k vk t2)
| T Red a x vx (T Red b y vy c) =>
T Red (T Black a x vx b) y vy (T Black c k vk t2)
| a => match t2 with
| T Red (T Red b y vy c) z vz d =>
T Red (T Black t1 k vk b) y vy (T Black c z vz d)
| T Red b y vy (T Red c z vz d) =>
T Red (T Black t1 k vk b) y vy (T Black c z vz d)
| _ => T Black t1 k vk t2
end
end
end.
(** (In contrast, Sedgewick has proposed _left-leaning red-black
trees_, which have a simpler balance function (but a more
complicated invariant). He does this in order to make the proof of
correctness easier: there are fewer cases in the [balance]
function, and therefore fewer cases in the case-analysis of the
proof of correctness of [balance]. But as you will see, our proofs
about [balance] will have automated case analyses, so we don't
care how many cases there are!) *)
(** The [insert] operation uses a helper function [ins] to do
the insertion, and another function [makeBlack] to color
the resulting tree's root black. *)
Definition makeBlack t :=
match t with
| E => E
| T _ a x vx b => T Black a x vx b
end.
Fixpoint ins x vx s :=
match s with
| E => T Red E x vx E
| T c a y vy b => if ltb x y then balance c (ins x vx a) y vy b
else if ltb y x then balance c a y vy (ins x vx b)
else T c a x vx b
end.
Definition insert x vx s := makeBlack (ins x vx s).
(* ################################################################# *)
(** * Verification *)
(** Now that the implementation has been defined, we want to verify
certain properties:
- [SearchTree]: Red-black trees should satisfy the [SearchTree]
representation invariant: the keys in each left subtree are all
less than the node's key, and the keys in each right subtree are
greater.
- [Abs]: Red-black trees should be correct w.r.t. the abstraction
relation [Abs], which relates trees to maps.
- [Balanced]: For sake of performance, red-black trees should be
balanced. Specifically, they should satisfy the red-black
invarinats: there is the same number of black nodes on any path
from the root to each leaf; and there are never two consecutive
red nodes.
We'll address the properties in that order. But first, we need to
learn a little more about proof automation, because the proofs of
those properties would be too hard to manage without automation.
*)
(* ################################################################# *)
(** * Proof Automation for Case-Analysis Proofs. *)
(** Several of the proofs for red-black trees require a big case
analysis over all the clauses of the [balance] function. These
proofs are very tedious to do by hand, but are easy to automate.
Let's get some practice by automating a proof that insertion
produces a non-empty tree. *)
Lemma T_neq_E:
forall c l k v r, T c l k v r <> E.
Proof. (* FOLD *)
unfold not. intros. discriminate.
Qed.
Lemma ins_not_E: forall x vx s, ins x vx s <> E.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.
(** Let's [destruct] on the topmost case, [ltb x k]. We can use
[destruct] instead of [bdestruct] because we don't need to
remember whether [x=k]. *)
destruct (ltb x k).
(** The topmost test is [match c with...], so just [destruct c]. *)
destruct c.
apply T_neq_E.
(** The topmost test is [match a1 with...], so just [destruct a1]. *)
destruct a1.
(** The topmost test is [match s2 with...], so just [destruct s2]. *)
destruct s2.
unfold not. intros. discriminate.
(** How long will this go on? A long time! But it will terminate. Just
keep typing. Better yet, let's automate.
The following tactic applies whenever the current goal looks like
[match ?c with Red => _ | Black => _ end <> _]
and what it does in that case is, [destruct c]. *)
match goal with
| |- match ?c with Red => _ | Black => _ end <> _ => destruct c
end.
(** The following tactic applies whenever the current goal looks like,
[match ?s with E => _ | T _ _ _ _ _ => _ end <> _]
and what it does in that case is, [destruct s]. *)
match goal with
| |- match ?s with E => _ | T _ _ _ _ _ => _ end <> _ => destruct s
end.
(** Let's apply that tactic again, and then try it on the subgoals,
recursively. Recall that the [repeat] tactical keeps trying the
same tactic on subgoals. *)
repeat match goal with
| |- match ?s with E => _ | T _ _ _ _ _ => _ end <> _ => destruct s
end.
match goal with
| |- T _ _ _ _ _ <> E => apply T_neq_E
end.
(** Using what we've learned, let's start the proof all over again. *)
Abort.
Lemma ins_not_E: forall x vx s, ins x vx s <> E.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.
(** This is the beginning of the big case analysis. This time, let's
combine several tactics together: *)
repeat match goal with
| |- (if ?x then _ else _) <> _ => destruct x
| |- match ?c with Red => _ | Black => _ end <> _=> destruct c
| |- match ?s with E => _ | T _ _ _ _ _ => _ end <> _ => destruct s
end.
(** What we have left is 117 cases, every one of which can be proved
the same way: *)
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
(* Only 111 cases to go... *)
Abort.
(** Let's start over again, adding that [apply] to the case analysis. *)
Lemma ins_not_E: forall x vx s, ins x vx s <> E.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.
repeat match goal with
| |- (if ?x then _ else _) <> _ => destruct x
| |- match ?c with Red => _ | Black => _ end <> _ => destruct c
| |- match ?s with E => _ | T _ _ _ _ _ => _ end <> _ => destruct s
| |- T _ _ _ _ _ <> E => apply T_neq_E
end.
Qed.
(** Success! We'll use that same technique of repeatedly matching
against the goal in a few more proofs, below. *)
(* ################################################################# *)
(** * Verifying Preservation of the Representation Invariant *)
(** The [SearchTree] property for red-black trees is exactly the same as
for (unbalanced) search trees; we just ignore the color [c] of each
node. *)
Inductive SearchTree' : Z -> tree -> Z -> Prop :=
| ST_E : forall lo hi, lo <= hi -> SearchTree' lo E hi
| ST_T : forall lo c l k v r hi,
SearchTree' lo l (int2Z k) ->
SearchTree' (int2Z k + 1) r hi ->
SearchTree' lo (T c l k v r) hi.
Inductive SearchTree: tree -> Prop :=
| ST_intro: forall t lo hi, SearchTree' lo t hi -> SearchTree t.
(** Now we want to prove that the red-black operations preserve the
representation invariant. Since [lookup] doesn't return a tree,
all we need to worry about is [empty_tree] and [insert]. *)
(* ================================================================= *)
(** ** Empty and SearchTree *)
(** The empty tree is easy: *)
Theorem empty_SearchTree : SearchTree empty_tree.
Proof.
unfold empty_tree. apply ST_intro with (lo := 0) (hi := 0).
constructor. omega.
Qed.
(** Verifying that [insert] preserves [SearchTree] will take more
work. *)
(* ================================================================= *)
(** ** Balance and SearchTree *)
(** Since the [balance] function is an important part of [insert],
let's start by proving that [balance] produces a search tree
whose bounds are related to the bounds on its inputs: *)
Lemma balance_SearchTree:
forall c s1 k kv s2 lo hi,
SearchTree' lo s1 (int2Z k) ->
SearchTree' (int2Z k + 1) s2 hi ->
SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
(** Use proof automation for this case analysis. *)
repeat match goal with
| |- SearchTree' _ (match ?c with Red => _ | Black => _ end) _ => destruct c
| |- SearchTree' _ (match ?s with E => _ | T _ _ _ _ _ => _ end) _ => destruct s
end.
(** 58 cases to consider! *)
* constructor; auto.
* constructor; auto.
* constructor; auto.
(* There's a pattern emerging, but, let's keep going for now. *)
* constructor; auto.
constructor; auto. constructor; auto.
(** To prove this one, we have to do [inversion] on the proof goals
above the line. *)
inv H. inv H0. inv H8. inv H9.
auto.
constructor; auto.
inv H. inv H0. inv H8. inv H9. auto.
inv H. inv H0. inv H8. inv H9. auto.
(** There's a pattern here. Whenever we have a hypothesis above the line
that looks like,
- H: SearchTree' _ E _
- H: SearchTree' _ (T _ _ _ _ _) _
we should invert it. Let's build that idea into our proof automation.
*)
Abort.
Lemma balance_SearchTree:
forall c s1 k kv s2 lo hi,
SearchTree' lo s1 (int2Z k) ->
SearchTree' (int2Z k + 1) s2 hi ->
SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
(** Use proof automation for this case analysis. *)
repeat match goal with
| |- SearchTree' _ (match ?c with Red => _ | Black => _ end) _ =>
destruct c
| |- SearchTree' _ (match ?s with E => _ | T _ _ _ _ _ => _ end) _ =>
destruct s
| H: SearchTree' _ E _ |- _ => inv H
| H: SearchTree' _ (T _ _ _ _ _) _ |- _ => inv H
end.
(** 58 cases to consider! *)
* constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
constructor; auto. constructor; auto.
(** Do we see a pattern here? We can add that to our automation! *)
Abort.
Lemma balance_SearchTree:
forall c s1 k kv s2 lo hi,
SearchTree' lo s1 (int2Z k) ->
SearchTree' (int2Z k + 1) s2 hi ->
SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
(** Use proof automation for this case analysis. *)
repeat match goal with
| |- SearchTree' _ (match ?c with Red => _ | Black => _ end) _ =>
destruct c
| |- SearchTree' _ (match ?s with E => _ | T _ _ _ _ _ => _ end) _ =>
destruct s
| H: SearchTree' _ E _ |- _ => inv H
| H: SearchTree' _ (T _ _ _ _ _) _ |- _ => inv H
end;
repeat (constructor; auto).
Qed.
(** **** Exercise: 2 stars, standard (ins_SearchTree)
You won't need proof automation of the kind we just used for
[balance_SearchTree] to prove this one. You will need to apply
[balance_SearchTree] twice. If you solved [insert_SearchTree] in
[SearchTree] you could copy and adapt that proof. *)
Lemma ins_SearchTree: forall x vx s lo hi,
lo <= int2Z x ->
int2Z x < hi ->
SearchTree' lo s hi ->
SearchTree' lo (ins x vx s) hi.
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(* ================================================================= *)
(** ** Insert and SearchTree *)
(** Our goal is to prove that [insert] preserves the property of being
a search tree. We'll eventually get there with the lemma
[insert_SearchTree] below, but there are several helper lemmas to
prove first.
Along the way you will likely need to use some library theorems
about [Z]. You might want to briefly peruse the output of the
following searches to familiarize yourself with what is available.
Hint: you don't need induction unless we have explicitly indicated
it. *)
Search Z.le.
Search Z.min.
Search Z.max.
(** **** Exercise: 2 stars, standard (ins_E_SearchTree) *)
Lemma ins_E_SearchTree : forall x vx,
SearchTree (ins x vx E).
Proof. (* FILL IN HERE *) Admitted.
(** [] *)
(** **** Exercise: 4 stars, standard (ins_T_SearchTree) *)
Lemma expand_range_SearchTree':
forall s lo hi,
SearchTree' lo s hi ->
forall lo' hi',
lo' <= lo -> hi <= hi' ->
SearchTree' lo' s hi'.
Proof. (* FILL IN HERE *) Admitted.
Lemma ins_T_SearchTree_aux : forall x vx c l k kv r lo hi,
SearchTree' lo l (int2Z k)
-> SearchTree' (int2Z k + 1) r hi
-> SearchTree' (Z.min lo (int2Z x))
(ins x vx (T c l k kv r))
(Z.max hi (int2Z x + 1)).
Proof.
(** Hint: This is the hardest lemma. You'll need two applications
of [expand_range_SearchTree'], and a few lemmas about [Z]. *)
(* FILL IN HERE *) Admitted.
Lemma ins_T_SearchTree : forall x vx c l k kv r,
SearchTree (T c l k kv r) -> SearchTree (ins x vx (T c l k kv r)).
Proof.
(** Hint: the only lemma you need is [ins_T_SearchTree_aux]. *)
(* FILL IN HERE *) Admitted.
(** [] *)
(** **** Exercise: 3 stars, standard (insert_preserves_SearchTree) *)
Lemma ins_makeBlack_SearchTree : forall s,
SearchTree s -> SearchTree (makeBlack s).
Proof. (* FILL IN HERE *) Admitted.
Theorem insert_SearchTree: forall x vx s,
SearchTree s -> SearchTree (insert x vx s).
Proof.
(** Hint: use [ins_makeBlack_SearchTree], [ins_E_SearchTree], and
[ins_T_SearchTree]. *)
(* FILL IN HERE *) Admitted.
(** [] *)
(** That concludes our verification that the red-black implementation
preserves the representation invariant. The main theorems we
proved were the following: *)
Check empty_SearchTree.
Check insert_SearchTree.
(* ################################################################# *)
(** * Verifying Correctness w.r.t. the Abstraction Relation *)
(** Now we turn our attention to the abstraction relation. As we did
in [SearchTree] and [Extract], we will prove a
relationship between red-black trees and (functional) maps. *)
Import IntMaps.
(** The abstraction relation [Abs] is the same as what we previously
used, extended to include the color of nodes. *)
Definition combine {A} (pivot: Z) (m1 m2: total_map A) : total_map A :=
fun x => if Z.ltb x pivot then m1 x else m2 x.
Inductive Abs : tree -> total_map V -> Prop :=
| Abs_E : Abs E (t_empty default)
| Abs_T : forall a b c l k vk r,
Abs l a ->
Abs r b ->
Abs (T c l k vk r) (t_update (combine (int2Z k) a b) (int2Z k) vk).
(* ================================================================= *)
(** ** Empty and Abs *)
(** The empty tree is related to the empty map. *)
(** **** Exercise: 1 star, standard (empty_tree_relate) *)
Theorem empty_tree_relate :
Abs empty_tree (t_empty default).
Proof. (* FILL IN HERE *) Admitted.
(** [] *)
(* ================================================================= *)
(** ** Lookup and Abs *)
(** If tree [t] abstracts to map [m], then looking up a key in [t]
yields the same value as looking up that key in [m]. *)
(** **** Exercise: 1 star, standard (lookup_relate)
Copy and adapt your proof from [Extract]. *)
Theorem lookup_relate : forall k t m,
Abs t m -> lookup k t = m (int2Z k).
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(* ================================================================= *)
(** ** Balance and Abs *)
(** The proof that [insert] is correct will take considerably more
work because of the [balance] operation. In exercise
[balance_relate], soon to come, we'll prove that [balance]
preserves the abstraction of a search tree. But first, here are
two quick lemmas that will be helpful in our proof about
[balance]. *)
Lemma Abs_helper:
forall m' t m, Abs t m' -> m' = m -> Abs t m.
Proof.
intros. subst. auto.
Qed.
Lemma SearchTree'_le:
forall lo t hi, SearchTree' lo t hi -> lo <= hi.
Proof.
induction 1; omega.
Qed.
(** We'll also make use of this [Ltac] tactic in [balance_relate]. The
purpose of this tactic is to prove the equivalence of maps. *)
Ltac contents_equivalent_prover :=
extensionality x; unfold t_update, combine, t_empty;
repeat match goal with
| |- context [if ?A then _ else _] => bdestruct A
end;
auto;
omega.
(** Here's an example of how that tactic could be used. You might want
to stop and convince yourself that you understand why the tactic
succeeds in proving the example. *)
Section ContentsExample.
Open Scope string_scope.
Example maps_equiv_ex :
t_update (t_update (t_empty "") 2 "two") 1 "one"
=
t_update (t_update (t_empty "") 1 "one") 2 "two".
Proof.
contents_equivalent_prover.
Qed.
End ContentsExample.
(** **** Exercise: 4 stars, standard (balance_relate) *)
(** Now it's time to prove that [balance] preserves the abstraction of
a search tree. We'll make heavy use of proof automation, similar to
[ins_not_E] and [balance_SearchTree] above. *)
Theorem balance_relate:
forall c l k vk r m,
SearchTree (T c l k vk r) ->
Abs (T c l k vk r) m ->
Abs (balance c l k vk r) m.
Proof.
intros. inv H. unfold balance.
(** This is the main "loop" of the proof. *)
repeat
match goal with
| _ => idtac (* this is just a "no op" to get us started *)
end.
(** At this point you should have just 1 subgoal. Now you're going to
augment that main loop by adding the following clauses, one at a
time, to it.
- 1. The subgoal has the form
[Abs (match c with Red => ... | Black => ...) ...].
We need to destruct [c] to make progress. So, replace the "no
op" clause provided in the main loop with the following clause:
[ | |- Abs (match ?c with Red => _ | Black => _ end) _ =>
destruct c]
Step forward through the main loop; you should have 2 subgoals.
If not, stop now and figure out what has gone wrong.
- 2. The first subgoal is provable from the assumptions. So, add a
new clause to the main loop below the clause you just added:
[ | _ => assumption ]
Step forward through the main loop; you should be back down to
1 subgoal.
- 3. The subgoal has the form
[Abs (match ... with E => ... | T ... => ...) ...]
Much like in step (1) above, we need to destruct the tree being
matched to make progress. So, add a new clause to the main loop
to do that. You should end up with 21 subgoals.
- 4. The first subgoal has the form [Abs (T ...) m]. That is, we
want to show that the tree abstracts to map [m]. Look at the
hypotheses: one of them (call it [H0]) shows that a similar
tree abstracts to the same [m]. To make progress, we can
extract the evidence for _why_ that similar tree abstracts to
[m]. So, add a new clause that inverts the hypothesis using the
[inv] tactic.
If it seems like Coq diverges into an infinite loop, double
check the pattern matching you wrote for the hypothesis. Make
sure you're matching against something containing the [T]
constructor.
You still have 21 subgoals, but instead of being about
arbitrary maps [m], they should now contain information about
the contents of those maps, e.g., [t_update (combine ...) ...].
- 5. There is yet more information to be gained about the maps. Look
at the hypotheses for the current subgoal: one of them should
be of the form [Abs E ...]. That is, it tells us something
about a map being the abstraction of the empty tree. Empty
trees were not covered by the clause we just added in step (4).
So, add another clause to invert such hypotheses about the
abstraction of empty trees. You will still have 21 subgoals,
but now none of the hypotheses about [Abs] will contain
information about the structure of the tree; that information
has all been extracted.
- 6. At this point every subgoal has the form [Abs (T ...) (t_update
...)]. We'd like to apply the [Abs_T] constructor to make
progress. But the map [(t_update ...)] does not have the right
form. We therefore need to replace [(t_update ...)] with
another map that has the right form. That's why we created the
[Abs_helper] lemma above.
Try [eapply Abs_helper] on the first subgoal. (Don't worry
about the main loop yet.) You'll get two subgoals. The first
you can prove with
[[
apply Abs_T. apply Abs_T.
apply Abs_E. apply Abs_E.
apply Abs_T. eassumption. eassumption.
]]
Step through that, one command at a time, to understand what
it's doing. By the end, Coq has determined what the map needs
to be in order for [Abs_T] to be applicable. The second subgoal
which remains requires us to prove that map to be equivalent to
the original map.
Now undo those seven commands, and do this instead:
[[
repeat econstructor; eassumption.
]]
That solves the first subgoal in exactly the same way.
What we'd like to do now is use apply this technique to all of
the 21 subgoals. Do that by adding this clause to your main
loop:
[[
| |- _ => eapply Abs_helper; [repeat econstructor; eassumption | ]
]]
The tactical [t1; [t2 | ]] first applies [t1] to a goal, then
applies [t2] to the first resulting subgoal, leaving the second
resulting subgoal unchanged.
You should still have 21 subgoals, all of which should have the
form [t_update ... = t_update ...]. So at this point, we're
done with the abstraction relation, and all that's left is to
prove the equality of some maps.
- 7. The whole reason we created the [contents_equivalent_prover]
tactic above was to prove the equality of maps. So add this
clause to your main loop:
[[
| |- _ = _ => contents_equivalent_prover
]]
Unfortunately, you'll notice that it doesn't solve any of the
subgoals. The problem is that we still haven't extracted enough
information from the hypotheses. The reason those maps are all
equal has to do with inequality relationships amongst the keys
being used in the maps. Those inequalities are part of the
[SearchTree'] property. So, we need to invert all the
hypotheses about [SearchTree'].
Add two clauses to the main loop to use [inv] to invert any
hypothesis of the form [SearchTree' ... (T ...) ...] or
[SearchTree' ... E ...]. As with [Abs] above, if you encounter
an infinite loop, make sure you are pattern matching correctly.
You should be left with 15 subgoals.
- 8. The final piece of information to extract for map equalities
has to do with an additional property of [SearchTree']:
whenever [SearchTree' lo _ hi] holds, it also holds that [lo <=
hi]. So, add a clause to the main loop to [apply
SearchTree'_le] in any such hypothesis when the subgoal has the
form [ _ = _ ].
Qed! That finishes the proof. You might notice that Coq takes
awhile to process the [Qed.] command, because the resulting proof
is quite large. Try printing it out with [Print balance_relate.]
if you want to see for yourself. (But please don't leave that
[Print] command in any homework you might submit.) *)
(* FILL IN HERE *) Admitted.
(** [] *)
(* ================================================================= *)
(** ** Insert and Abs *)
(** Now that we've taken care of [balance], let's get back to
[insert]. *)
(** **** Exercise: 2 stars, standard (ins_left_and_right) *)
(** The next two lemmas, [ins_left] and [ins_right], show how the
[SearchTree] property is preserved by [ins]. We'll soon need that
information about the representation invariant. *)
Lemma ins_left : forall c l j u r k v,
SearchTree (T c l j u r)
-> int2Z k < int2Z j
-> SearchTree (T c (ins k v l) j u r).
Proof. (* FILL IN HERE *) Admitted.
Lemma ins_right : forall c l j u r k v,
SearchTree (T c l j u r)
-> int2Z j < int2Z k
-> SearchTree (T c l j u (ins k v r)).
Proof. (* FILL IN HERE *) Admitted.
(** [] *)
(** **** Exercise: 3 stars, standard (ins_relate)
Now we prove the correctness of [ins], which is the core of the
[insert] operation.
Copy and adapt your proof of [insert_relate] from [Extract].
You'll use [balance_relate] twice, and [ins_left] and [ins_right]
once each. Other than those, you shouldn't need any other theorems
from this file. *)
Theorem ins_relate:
forall k v t m,
SearchTree t ->
Abs t m ->
Abs (ins k v t) (t_update m (int2Z k) v).
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(** The rest of the proof that [insert] is correct is straightforward,
given [ins_relate], so we'll go ahead and finish it rather than
make it an exercise. *)
Lemma makeBlack_relate:
forall t m,
Abs t m ->
Abs (makeBlack t) m.
Proof.
intros.
destruct t; simpl; auto.
inv H; constructor; auto.
Qed.
Theorem insert_relate:
forall k v t m,
SearchTree t ->
Abs t m ->
Abs (insert k v t) (t_update m (int2Z k) v).
Proof.
intros.
unfold insert.
apply makeBlack_relate.
apply ins_relate; auto.
Qed.
(** OK, we're almost done! We have proved all these main theorems: *)
Check empty_tree_relate.
Check lookup_relate.
Check insert_relate.
(** Together these imply that this implementation of red-black trees
is correct, in that it respects the abstraction relation. *)
(* ================================================================= *)
(** ** Optional: Verification of Elements *)
(** **** Exercise: 4 stars, standard, optional (elements)
Prove the correctness of the [elements] function. Because [elements]
does not pay attention to colors, and does not rebalance the tree,
then its proof should be a simple copy-paste from [SearchTree],
with only minor edits. *)
Fixpoint elements' (s: tree) (base: list (key*V)) : list (key * V) :=
match s with
| E => base
| T _ a k v b => elements' a ((k,v) :: elements' b base)
end.
Definition elements (s: tree) : list (key * V) := elements' s nil.
Definition elements_property (t: tree) (m: total_map V) : Prop :=
forall k v,
(In (k,v) (elements t) -> m (int2Z k) = v) /\
(m (int2Z k) <> default ->
exists k', int2Z k = int2Z k' /\ In (k', m (int2Z k)) (elements t)).
Theorem elements_relate:
forall t m,
SearchTree t ->
Abs t m ->
elements_property t m.
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(* ################################################################# *)
(** * Verification of Balance *)
(** Red-black trees are supposed to be more efficient than ordinary
search trees, because they stay balanced. In a perfectly balanced
tree, any two leaves have exactly the same depth, or the
difference in depth is at most 1. In an approximately balanced
tree, no leaf is more than twice as deep as another leaf.
Red-black trees are approximately balanced. Consequently, no node
is more then 2 log N deep, where N is the number of nodes in the
tree, and the run time for insert or lookup is therefore O (log
N).
We can't prove anything _directly_ about the run time, because we
don't have a cost model for Coq functions. But we can prove that
the trees stay approximately balanced; this tells us important
information about their efficiency. *)
(* ================================================================= *)
(** ** The Red-Black Invariants *)
(** There are two red-black invariants. The _global invariant_
requires the _black height_ of every path in the tree to be the
same. That is, there must be [n] black nodes on every path from
the root to a leaf, for some [n]. The _local invariant_ forbids a
red node from having a red child.
We'll define an inductive proposition [is_redblack] to capture the
notion that a subtree satisfies the red-black invariants. The
proposition will take three arguments: the subtree, the color of
the subtree's parent, and the black height that the subtree should
have. That color is used to enforce the local invariant, and the
height is of course used to enforce the global invariant.
So, [is_redblack t c n] should hold when [t] satisfies the
red-black invariants, assuming that [c] is the color of [t]'s
parent, and [n] is the black height that [t] is supposed to have.
If [t] happens to have no parent (i.e., it is the entire tree),
then it will be colored black by the [insert] algorithm, so it
won't actually matter what color its (non-existent) parent might
purportedly have: red or black, it can't violate the local
invariant. If [t] is a leaf, then it likewise won't matter what
its parent color is, and its black height must be zero. *)
Inductive is_redblack : tree -> color -> nat -> Prop :=
| IsRB_leaf: forall c, is_redblack E c 0
| IsRB_r: forall l k v r n,
is_redblack l Red n ->
is_redblack r Red n ->
is_redblack (T Red l k v r) Black n
| IsRB_b: forall c l k v r n,
is_redblack l Black n ->
is_redblack r Black n ->
is_redblack (T Black l k v r) c (S n).
(** To satisfy the red-black invariant, then, is to satisfy
[is_redblack] for some parent color and black height. *)
Definition rb (t : tree) : Prop :=
exists c n, is_redblack t c n.
(* ================================================================= *)
(** ** Empty and RB *)
(** The empty tree satisfies [rb]. *)
(** **** Exercise: 1 star, standard (empty_rb) *)
Theorem empty_rb : rb empty_tree.
Proof. (* FILL IN HERE *) Admitted.
(** [] *)
(* ================================================================= *)
(** ** Insert and RB *)
(** Our final goal is to prove that insertion preserves
the red-black invariants. That will take us several
lemmas to complete. *)
(** **** Exercise: 1 star, standard (is_redblack_toblack) *)
Lemma is_redblack_toblack : forall s n,
is_redblack s Red n -> is_redblack s Black n.
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(** **** Exercise: 2 stars, standard (is_redblack_makeblack) *)
Lemma is_redblack_makeblack : forall s c n,
is_redblack s Black n
-> exists n, is_redblack (makeBlack s) c n.
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(** [nearly_redblack] expresses, "the tree is a red-black tree, except
that it's nonempty and it is permitted to have two consecutive red
nodes at the root." *)
Inductive nearly_redblack : tree -> nat -> Prop :=
| nrRB_r: forall tl k kv tr n,
is_redblack tl Black n ->
is_redblack tr Black n ->
nearly_redblack (T Red tl k kv tr) n
| nrRB_b: forall tl k kv tr n,
is_redblack tl Black n ->
is_redblack tr Black n ->
nearly_redblack (T Black tl k kv tr) (S n).
(** **** Exercise: 4 stars, standard (ins_is_redblack) *)
(** You will need proof automation, in a similar style to the proofs
of [ins_not_E] and [balance_relate]. This is challenging; don't
be afraid to skip over it and come back. *)
Lemma ins_is_redblack:
forall x vx s n,
(is_redblack s Black n -> nearly_redblack (ins x vx s) n) /\
(is_redblack s Red n -> is_redblack (ins x vx s) Black n).
Proof.
induction s; intro n; simpl; split;
intros; inv H; repeat constructor; auto.
*
destruct (IHs1 n); clear IHs1.
destruct (IHs2 n); clear IHs2.
specialize (H0 H6).
specialize (H2 H7).
clear H H1.
unfold balance.
(* FILL IN HERE *) Admitted.
(** [] *)
Corollary ins_red :
forall t k v n,
(is_redblack t Red n -> is_redblack (ins k v t) Black n).
Proof.
intros. apply ins_is_redblack. assumption.
Qed.
Corollary ins_black :
forall t k v n,
(is_redblack t Black n -> nearly_redblack (ins k v t) n).
Proof.
intros. apply ins_is_redblack. assumption.
Qed.
(** **** Exercise: 3 stars, standard (insert_is_redblack)
You will find [ins_red], [ins_black], and [is_redblack_makeblack]
useful here. *)
Lemma insert_is_redblack: forall k v t c n,
is_redblack t c n ->
exists c' n',
is_redblack (insert k v t) c' n'.
Proof.
(* FILL IN HERE *) Admitted.
(** [] *)
(** **** Exercise: 2 stars, standard (insert_rb) *)
Theorem insert_rb: forall k v t,
rb t -> rb (insert k v t).
Proof. (* FILL IN HERE *) Admitted.
(** [] *)
End TREES.
(* ################################################################# *)
(** * Extracting and Measuring Red-Black Trees *)
Extraction "redblack.ml" empty_tree insert lookup elements.
(** You can run this inside the OCaml top level by:
#use "redblack.ml";;
#use "test_searchtree.ml";;
On my machine, in the byte-code interpreter this prints,
Insert and lookup 1000000 random integers in 0.889 seconds.
Insert and lookup 20000 random integers in 0.016 seconds.
Insert and lookup 20000 consecutive integers in 0.015 seconds.
You can compile and run this with the OCaml native-code compiler by:
$ ocamlopt -c redblack.mli redblack.ml
$ ocamlopt redblack.cmx -open Redblack test_searchtree.ml -o test_redblack
$ ./test_redblack
On my machine this prints,
Insert and lookup 1000000 random integers in 0.436 seconds.
Insert and lookup 20000 random integers in 0. seconds.
Insert and lookup 20000 consecutive integers in 0. seconds.
*)
(* ################################################################# *)
(** * Success!
The benchmark measurements above (and in [Extract]) demonstrate that:
- On random insertions, red-black trees are slightly faster than ordinary BSTs
(red-black 0.436 seconds, vs ordinary 0.468 seconds)
- On consecutive insertions, red-black trees are _much_ faster than ordinary BSTs
(red-black 0. seconds, vs ordinary 0.374 seconds)
In particular, red-black trees are almost exactly as fast on the
consecutive insertions (0.015 seconds) as on the random (0.016 seconds).
*)