RedblackRed-Black Trees

Red-black trees are a kind of balanced binary search tree (BST). Keeping the tree balanced ensures that the worst-case running time of operations is logarithmic rather than linear.
This chapter is based on the Coq standard library module MSetRBT, which can be found at https://coq.inria.fr/distrib/current/stdlib/Coq.MSets.MSetRBT.html. The design decisions for that module are described in the following paper:

Implementation

Sections can have Variables which are in scope throughout. The variables become arguments as needed.

Section ValueType.

  Variable V : Type.
  Variable default : V.

    Definition key := int.

  Inductive color := Red | Black.

  Inductive tree : Type :=
  | E : tree
  | T : colortreekeyVtreetree.

  Definition empty_tree : tree :=
    E.

lookup is the same as for BSTs.

  Fixpoint lookup (x: key) (t : tree) : V :=
    match t with
    | Edefault
    | T _ tl k v trif ltb x k then lookup x tl
                      else if ltb k x then lookup x tr
                           else v
    end.

insert requires balancing the tree.

  Definition balance (rb : color) (t1 : tree) (k : key) (vk : V) (t2 : tree) : tree :=
    match rb with
    | RedT Red t1 k vk t2
    | _match t1 with
          | T Red (T Red a x vx b) y vy c
            T Red (T Black a x vx b) y vy (T Black c k vk t2)
          | T Red a x vx (T Red b y vy c) ⇒
            T Red (T Black a x vx b) y vy (T Black c k vk t2)
          | amatch t2 with
                | T Red (T Red b y vy c) z vz d
                  T Red (T Black t1 k vk b) y vy (T Black c z vz d)
                | T Red b y vy (T Red c z vz d) ⇒
                  T Red (T Black t1 k vk b) y vy (T Black c z vz d)
                | _T Black t1 k vk t2
                end
          end
    end.

  Fixpoint ins (x : key) (vx : V) (t : tree) : tree :=
    match t with
    | ET Red E x vx E
    | T c a y vy bif ltb x y then balance c (ins x vx a) y vy b
                      else if ltb y x then balance c a y vy (ins x vx b)
                           else T c a x vx b
    end.

  Definition make_black (t : tree) : tree :=
    match t with
    | EE
    | T _ a x vx bT Black a x vx b
    end.

  Definition insert (x : key) (vx : V) (t : tree) :=
    make_black (ins x vx t).

elements is the same as for BSTs.

  Fixpoint elements_tr (t : tree) (acc: list (key × V)) : list (key × V) :=
    match t with
    | Eacc
    | T _ l k v relements_tr l ((k, v) :: elements_tr r acc)
    end.

  Definition elements (t : tree) : list (key × V) :=
    elements_tr t [].

Case-Analysis Automation

Proofs about insert will need automation. Here's an example.

  Lemma ins_not_E : (x : key) (vx : V) (t : tree),
      ins x vx tE.
  Proof.
    intros. destruct t; simpl.
    discriminate.


    destruct (ltb x k).
    unfold balance.


    destruct c.
    discriminate.


    destruct (ins x vx t1).


    destruct t2.
    discriminate.


    (* Let's automate. *)

    match goal with
    | ⊢ match ?c with Red_ | Black_ end_destruct c
    | ⊢ match ?t with E_ | T _ _ _ _ __ end_destruct t
    end.


    repeat
      match goal with
      | ⊢ match ?c with Red_ | Black_ end_destruct c
      | ⊢ match ?t with E_ | T _ _ _ _ __ end_destruct t
      end.


    discriminate.


    match goal with
    | ⊢ T _ _ _ _ _Ediscriminate
    end.


  Abort.

  Lemma ins_not_E : (x : key) (vx : V) (t : tree),
      ins x vx tE.
  Proof.
    intros. destruct t; simpl.
    - discriminate.
    - unfold balance.
      repeat
        match goal with
        | ⊢ (if ?x then _ else _) ≠ _destruct x
        | ⊢ match ?c with Red_ | Black_ end_destruct c
        | ⊢ match ?t with E_ | T _ _ _ _ __ end_destruct t
        | ⊢ T _ _ _ _ _Ediscriminate
        end.
  Qed.

The BST Invariant

Red-black trees must be BSTs.


  Fixpoint ForallT (P: intVProp) (t: tree) : Prop :=
    match t with
    | ETrue
    | T c l k v rP k vForallT P lForallT P r
    end.

  Inductive BST : treeProp :=
  | ST_E : BST E
  | ST_T : (c : color) (l : tree) (k : key) (v : V) (r : tree),
      ForallT (fun k' _ ⇒ (Abs k') < (Abs k)) l
      ForallT (fun k' _ ⇒ (Abs k') > (Abs k)) r
      BST l
      BST r
      BST (T c l k v r).

  Lemma empty_tree_BST: BST empty_tree.
  Proof.
    unfold empty_tree. constructor.
  Qed.
Let's show that insert preserves the BST invariant, that is:

  Theorem insert_BST : t v k,
      BST t
      BST (insert k v t).
  Abort.
It will take quite a bit of work, but automation will help.
First, we show that if a non-empty tree would be a BST, then the balanced version of it is also a BST:

  Lemma balance_BST: (c : color) (l : tree) (k : key) (v : V) (r : tree),
      ForallT (fun k' _ ⇒ (Abs k') < (Abs k)) l
      ForallT (fun k' _ ⇒ (Abs k') > (Abs k)) r
      BST l
      BST r
      BST (balance c l k v r).
  Proof.
    intros c l k v r PL PR BL BR. unfold balance.

    repeat
      match goal with
      | ⊢ BST (match ?c with Red_ | Black_ end) ⇒ destruct c
      | ⊢ BST (match ?t with E_ | T _ _ _ _ __ end) ⇒ destruct t
      end.

    (* 58 cases remaining. *)

    - constructor. assumption. assumption. assumption. assumption.
    - constructor; auto.
    - constructor; auto.
    - (* Now the tree gets bigger, and the proof gets more complicated. *)
      constructor; auto.

      + simpl in ×. repeat split.
        (* The intro pattern ? means to let Coq choose the name. *)
        destruct PR as [? _]. omega.

      + simpl in ×. repeat split.
        × inv BR. simpl in ×. destruct H5 as [? _]. omega.
        × inv BR. simpl in ×. destruct H5 as [_ [? _]]. auto.
        × inv BR. simpl in ×. destruct H5 as [_ [_ ?]]. auto.

      + constructor; auto.

      + inv BR. inv H7. constructor; auto.

    - constructor; auto.

    - (* 53 cases remain. This could go on for awhile... *)

  Abort.

Let's automate.

  Lemma balance_BST: (c : color) (l : tree) (k : key) (v : V) (r : tree),
      ForallT (fun k' _ ⇒ (Abs k') < (Abs k)) l
      ForallT (fun k' _ ⇒ (Abs k') > (Abs k)) r
      BST l
      BST r
      BST (balance c l k v r).
  Proof.
    intros. unfold balance.

    repeat
      (match goal with
       | ⊢ BST (match ?c with Red_ | Black_ end) ⇒ destruct c
       | ⊢ BST (match ?t with E_ | T _ _ _ _ __ end) ⇒ destruct t
       | ⊢ ForallT _ (T _ _ _ _ _) ⇒ repeat split
       | H: ForallT _ (T _ _ _ _ _) ⊢ _destruct H as [? [? ?] ]
       | H: BST (T _ _ _ _ _) ⊢ _inv H
       end;
       (try constructor; auto; try omega)).

41 cases remain. They're about inequalities.

  Abort.
To make progress, we can set up some helper lemmas.

  Lemma ForallT_imp : (P Q : intVProp) t,
      ForallT P t
      ( k v, P k vQ k v) →
      ForallT Q t.
  Proof.
    induction t; intros.
    - auto.
    - destruct H as [? [? ?]]. repeat split; auto.
  Qed.

  Lemma ForallT_greater : t k k0,
      ForallT (fun k' _Abs k' > Abs k) t
      Abs k > Abs k0
      ForallT (fun k' _Abs k' > Abs k0) t.
  Proof.
    intros. eapply ForallT_imp; eauto.
    intros. simpl in H1. omega.
  Qed.

  Lemma ForallT_less : t k k0,
      ForallT (fun k' _Abs k' < Abs k) t
      Abs k < Abs k0
      ForallT (fun k' _Abs k' < Abs k0) t.
  Proof.
    intros; eapply ForallT_imp; eauto.
    intros. simpl in H1. omega.
  Qed.
Now we can return to automating the proof.

  Lemma balance_BST: (c : color) (l : tree) (k : key) (v : V) (r : tree),
      ForallT (fun k' _ ⇒ (Abs k') < (Abs k)) l
      ForallT (fun k' _ ⇒ (Abs k') > (Abs k)) r
      BST l
      BST r
      BST (balance c l k v r).
  Proof.
    intros. unfold balance.

    repeat
      (match goal with
       | ⊢ BST (match ?c with Red_ | Black_ end) ⇒ destruct c
       | ⊢ BST (match ?s with E_ | T _ _ _ _ __ end) ⇒ destruct s
       | ⊢ ForallT _ (T _ _ _ _ _) ⇒ repeat split
       | H: ForallT _ (T _ _ _ _ _) ⊢ _destruct H as [? [? ?] ]
       | H: BST (T _ _ _ _ _) ⊢ _inv H
       end;
       (try constructor; auto; try omega)).

    (* all: t applies t to every subgoal. *)
    all: try eapply ForallT_greater; try eapply ForallT_less; eauto; try omega.
  Qed.


  (* ...after several exercises... *)

Exercise: 2 stars, standard (insert_BST)



  Theorem insert_BST : t v k,
      BST t
      BST (insert k v t).
  Proof.
    (* FILL IN HERE *) Admitted.

Verification

We now verify the equational specification of maps holds for red-black trees:
        lookup k empty_tree = default
        lookup k (insert k v t) = v
        lookup k' (insert k v t) = lookup k' t if kk'
The first equation is trivial to verify.

  Lemma lookup_empty : k, lookup k empty_tree = default.
  Proof. auto. Qed.
The next two equations are more challenging because of balance.
...after several helper lemmas...

Exercise: 3 stars, standard (lookup_insert)


  Theorem lookup_insert_eq : (t : tree) (k : key) (v : V),
      BST t
      lookup k (insert k v t) = v.
  Proof.
    (* FILL IN HERE *) Admitted.

  Theorem lookup_insert_neq: (t : tree) (k k' : key) (v : V),
      BST t
      kk'
      lookup k' (insert k v t) = lookup k' t.
  Proof.
    (* FILL IN HERE *) Admitted.

Efficiency

Red-black trees must also satisfy the red-black invariants:
  • Local Invariant: No red node has a red child.
    • Global Invariant: Every path from the root to a leaf has the same number of black nodes.
These ensure that the tree remains approximately balanced.
Proposition RB t c n holds when t satisfies the red-black invariants, assuming that c is the color of t's parent, and n is the black height that t is supposed to have.

  Inductive RB : treecolornatProp :=
  | RB_leaf: (c : color), RB E c 0
  | RB_r: (l r : tree) (k : key) (v : V) (n : nat),
      RB l Red n
      RB r Red n
      RB (T Red l k v r) Black n
  | RB_b: (c : color) (l r : tree) (k : key) (v : V) (n : nat),
      RB l Black n
      RB r Black n
      RB (T Black l k v r) c (S n).

...after many helper lemmas...

Exercise: 2 stars, standard (insert_RB)

Prove that insert maintains the RB invariant.

  Lemma insert_RB : (t : tree) (k : key) (v : V) (n : nat),
      RB t Red n
       (n' : nat), RB (insert k v t) Red n'.
  Proof.
    (* FILL IN HERE *) Admitted.


End ValueType.

Performance of Extracted Code

We can extract the red-black tree implementation:

Extraction "redblack.ml" empty_tree insert lookup elements.
Run it in the OCaml top level with these commands:
      #use "redblack.ml";;
      #use "test_searchtree.ml";;
On a recent machine with a 2.9 GHz Intel Core i9 that prints:
      Insert and lookup 1000000 random integers in 0.860663 seconds.
      Insert and lookup 20000 random integers in 0.007908 seconds.
      Insert and lookup 20000 consecutive integers in 0.004668 seconds.
That execution uses the bytecode interpreter. The native compiler will have better performance:
      $ ocamlopt -c redblack.mli redblack.ml
      $ ocamlopt redblack.cmx -open Redblack test_searchtree.ml -o test_redblack
      $ ./test_redblack
On the same machine that prints,
      Insert and lookup 1000000 random integers in 0.475669 seconds.
      Insert and lookup 20000 random integers in 0.00312 seconds.
      Insert and lookup 20000 consecutive integers in 0.001183 seconds.
The benchmark measurements above (and in Extract) demonstrate the following:
  • On random insertions, red-black trees are about the same as ordinary BSTs.
  • On consecutive insertions, red-black trees are much faster than ordinary BSTs.
  • Red-black trees are about as fast on consecutive insertions as on random.