Lecture 18: Dependent types

Author: Adam Chlipala, with modifications by CS-628 and CS 6115 staff.
License: No redistribution allowed (usage by permission in CS-628 and CS 6115).

A motivating example
Length-Indexed Lists
- Side note: ilist without the n
The One Rule of Dependent Pattern Matching in Coq

Subset types and their relatives help us integrate verification with programming. But we have only scratched the tip of the iceberg that is Coq's mechanism for defining inductive types.

A motivating example

Consider the problem of defining a type of combinational circuits. With the tools we previously studied, we might chose to use a simple inductive type and an interpreter to give the semantics:

Module SimpleTypes.
  Inductive circuit: Set :=
  | ReadRegister (idx: nat) (* Variables *)
  | Constant (bs: list bool)
  | And (c0 c1: circuit)
  | Not (c: circuit)
  | Firstn (c: circuit) (len: nat)
  | Skipn (c: circuit) (idx: nat)
  | Append (c0 c1: circuit)
  | Mux (c c0 c1: circuit).

  Fixpoint simulate (regs: nat -> option (list bool)) (c: circuit)
    : list bool :=
    match c with
    | ReadRegister idx =>
        match regs idx with
        | Some bs => bs
        | None => []
        end
    | Constant bs => bs
    | And c0 c1 =>
        List.map (fun '(b0, b1) => andb b0 b1)
                 (List.combine (simulate regs c0) (simulate regs c1))
    | Not c =>
        List.map negb (simulate regs c)
    | Firstn c len =>
        List.firstn len (simulate regs c)
    | Skipn c idx =>
        List.skipn idx (simulate regs c)
    | Append c0 c1 =>
        simulate regs c0 ++ simulate regs c1
    | Mux c c0 c1 =>
        match simulate regs c with
        | [] => []
        | b :: _ => simulate regs (if b then c0 else c1)
        end
    end.

The resulting simulator can be run like this:

  Definition regs idx :=
    match idx with
    | 0 => Some [false;true;false]
    | 1 => Some [true]
    | _ => None
    end.

  Compute (simulate regs
             (Mux (ReadRegister 1)
                (And (ReadRegister 0) (Constant [false]))
                (Skipn (ReadRegister 1) 2))).= [false]
: list bool

End SimpleTypes.

Print sigT.Inductive sigT (A : Type) (P : A -> Type) : Type :=
    existT : forall x : A, P x -> {x : A & P x}.

Arguments sigT [A]%type_scope P%type_scope
Arguments existT [A]%type_scope P%function_scope x _
Check proj1_sig.proj1_sig
     : forall (A : Type) (P : A -> Prop),
       {x : A | P x} -> A
Check proj2_sig.proj2_sig
     : forall (A : Type) (P : A -> Prop)
         (e : {x : A | P x}), P (proj1_sig e)

Unfortunately, this specification is not very readable: we need a default case for undefined registers; the And operation truncates if an argument is longer than the other; Firstn may return less than n bits, and Skipn may run out of bits before skipping as many as requested, etc.

In contrast, if we could ensure that circuits all have the expected bitwidth and that all register references are in bounds, we could eliminate all the unnecessary cruft. With dependent types, we can do just that. Vector.t A n is a list of exactly n elements of type A; and Fin.t n is a number < n.

Module DependentTypes.
  Section DependentTypes.
    Context {nregs: nat} {reg_widths: Vector.t nat nregs}.

    Import VectorDef.VectorNotations.

    Inductive circuit : forall (w:nat), Set :=
    | ReadRegister (idx: Fin.t nregs) : circuit reg_widths[@idx]
    | Constant {w} (bs: Vector.t bool w) : circuit w
    | And {w} (c0 c1: circuit w) : circuit w
    | Not {w} (c: circuit w) : circuit w
    | Firstn {w} (c: circuit w) (len: Fin.t (S w))
      : circuit (proj1_sig (Fin.to_nat len))
    | Skipn {w} (c: circuit w) (idx: Fin.t (S w))
      : circuit (w - (proj1_sig (Fin.to_nat idx)))
    | Append {w0 w1} (c0: circuit w0) (c1: circuit w1)
      : circuit (w0 + w1)
    | Mux {w} (c: circuit 1) (c0 c1: circuit w) : circuit w.

    Fixpoint simulate {w}
      (regs: forall (idx: Fin.t nregs), Vector.t bool reg_widths[@idx])
      (c: circuit w) : Vector.t bool w :=
      match c with
      | ReadRegister idx => regs idx
      | Constant bs => bs
      | And c0 c1 =>
          Vector.map2 (fun b0 b1 => andb b0 b1)
            (simulate regs c0) (simulate regs c1)
      | Not c =>
          Vector.map negb (simulate regs c)
      | Firstn c len =>
          firstn len (simulate regs c)
      | Skipn c idx =>
          skipn idx (simulate regs c)
      | Append c0 c1 =>
          (simulate regs c0) ++ (simulate regs c1)
      | Mux c c0 c1 =>
          simulate regs (if Vector.hd (simulate regs c) then c0 else c1)
      end.
  End DependentTypes.
End DependentTypes.

Length-Indexed Lists

Many introductions to dependent types start out by showing how to use them to eliminate array bounds checks. When the type of an array tells you how many elements it has, your compiler can detect out-of-bounds dereferences statically. Since we are working in a pure functional language, the next best thing is length-indexed lists, which the following code defines.

Module ilist.
  Section ilist.
    Context {A : Set}.

Note how now we are sure to write out the type of each constructor in full, instead of using the shorthand notation we favored previously. The reason is that now the index to the inductive type ilist depends on details of a constructor's arguments. We are also using Set, the type containing the normal types of programming.

    Inductive ilist : nat -> Set :=
    | Nil : ilist O
    | Cons : forall {n}, A -> ilist n -> ilist (S n).

We see that, within its section, ilist is given type nat -> Set. Previously, every inductive type we have seen has either had plain Set as its type or has been a predicate with some type ending in Prop. The full generality of inductive definitions lets us integrate the expressivity of predicates directly into our normal programming.

The nat argument to ilist tells us the length of the list. The types of ilist's constructors tell us that a Nil list has length O and that a Cons list has length one greater than the length of its tail. We may apply ilist to any natural number, even natural numbers that are only known at runtime. It is this breaking of the phase distinction that characterizes ilist as dependently typed.

In expositions of list types, we usually see the length function defined first, but here that would not be a very productive function to code. Instead, let us implement list concatenation.

    Succeed Fixpoint app
            {n1} (ls1 : ilist n1)
            {n2} (ls2 : ilist n2) : ilist (n1 + n2) :=
      match ls1 with
      | Nil => ls2
      | Cons x ls1' => Cons x (app ls1' ls2)
      end.The command has succeeded and its effects have been reverted.

Past Coq versions signalled an error for this definition. The code is still invalid within Coq's core language, but current Coq versions automatically add annotations to the original program, producing a valid core program. These are the annotations on match discriminees that we began to study with subset types. We can rewrite app to give the annotations explicitly.

    Fixpoint app
             {n1} (ls1 : ilist n1)
             {n2} (ls2 : ilist n2) : ilist (n1 + n2) :=
      match ls1 in (ilist n1) return (ilist (n1 + n2)) with
      | Nil => ls2
      | Cons x ls1' => Cons x (app ls1' ls2)
      end.

Using return alone allowed us to express a dependency of the match result type on the value of the discriminee. What in adds to our arsenal is a way of expressing a dependency on the type of the discriminee. Specifically, the n1 in the in clause above is a binding occurrence whose scope is the return clause.

We may use in clauses only to bind names for the arguments of an inductive type family. That is, each in clause must be an inductive type family name applied to a sequence of underscores and variable names of the proper length. The positions for parameters to the type family must all be underscores. Parameters are those arguments declared with section variables or with entries to the left of the first colon in an inductive definition. They cannot vary depending on which constructor was used to build the discriminee, so Coq prohibits pointless matches on them. It is those arguments defined in the type to the right of the colon that we may name with in clauses.

Here's a useful function with a surprisingly subtle type, where the return type depends on the value of the argument.

    Fixpoint inject (ls : list A) : ilist (length ls) :=
      match ls with
      | nil => Nil
      | h :: t => Cons h (inject t)
      end.

We can define an inverse conversion and prove that it really is an inverse.

    Fixpoint unject {n} (ls : ilist n) : list A :=
      match ls with
      | Nil => nil
      | Cons h t => h :: unject t
      end.

    Theorem inject_inverse : forall ls, unject (inject ls) = ls.A: Set
forall ls : list A, unject (inject ls) = ls
    Proof.A: Set
forall ls : list A, unject (inject ls) = ls
      induction ls; simpl; congruence.
    Qed.

Now let us attempt a function that is surprisingly tricky to write. In ML, the list head function raises an exception when passed an empty list. With length-indexed lists, we can rule out such invalid calls statically, and here is a first attempt at doing so. We write _ for a term that we wish Coq would fill in for us, but we'll have no such luck.

    Fail Definition hd {n} (ls : ilist (S n)) : A :=
      match ls with
      | Nil => _
      | Cons h _ => h
      end.The command has indeed failed with message:
The following term contains unresolved implicit arguments:
  (fun (n : nat) (ls : ilist (S n)) =>
   match
     ls in (ilist n0)
     return match n0 with
            | 0 => IDProp
            | S _ => A
            end
   with
   | Nil => ?i
   | Cons h _ => h
   end)
More precisely: 
- ?i: Cannot infer this placeholder of type "IDProp"
  in environment:
  A : Set
  n : nat
  ls : ilist (S n)

It is not clear what to write for the Nil case, so we are stuck before we even turn our function over to the type checker. We could try omitting the Nil case.

    Succeed Definition hd {n} (ls : ilist (S n)) : A :=
      match ls with
      | Cons h _ => h
      end.The command has succeeded and its effects have been reverted.

Actually, these days, Coq is smart enough to make that definition work! However, it will be educational to look at how Coq elaborates this code into its core language, where, unlike in ML, all pattern matching must be exhaustive. We might try using an in clause somehow.

    Succeed Definition hd {n} (ls : ilist (S n)) : A :=
      match ls in (ilist (S n)) with
      | Cons h _ => h
      end.The command has succeeded and its effects have been reverted.

Due to some relatively new heuristics, Coq does accept this code, but in general it is not legal to write arbitrary patterns for the arguments of inductive types in in clauses. Only variables are permitted there, in Coq's core language. A completely general mechanism could only be supported with a solution to the problem of higher-order unification, which is undecidable.

Our final, working attempt at hd uses an auxiliary function and a surprising return annotation.

    Definition hd' {n} (ls : ilist n) :=
      match ls in (ilist n)
            return (match n with O => unit | S _ => A end) with
      | Nil => tt
      | Cons h _ => h
      end.

    Check hd'.hd'
     : ilist ?n ->
       match ?n with
       | 0 => unit
       | S _ => A
       end
where
?n : [A : Set |- nat]

    Definition hd {n} (ls : ilist (S n)) : A := hd' ls.

We annotate our main match with a type that is itself a match. We write that the function hd' returns unit when the list is empty and returns the carried type A in all other cases. In the definition of hd, we just call hd'. Because the index of ls is known to be nonzero, the type checker reduces the match in the type of hd' to A.

In fact, when we "got lucky" earlier with Coq accepting simpler definitions, under the hood it was desugaring almost to this one.

    Definition easy_hd {n} (ls : ilist (S n)) : A :=
      match ls with
      | Cons h _ => h
      end.

    Print easy_hd.easy_hd =
fun (n : nat) (ls : ilist (S n)) =>
match
  ls in (ilist n0)
  return match n0 with
         | 0 => IDProp
         | S _ => A
         end
with
| Nil => idProp
| Cons h _ => h
end
     : forall n : nat, ilist (S n) -> A

Arguments easy_hd {n}%nat_scope ls
easy_hd uses section variable A.
  End ilist.

  Arguments ilist A n : clear implicits.
End ilist.

Functions on ilist can be extracted, and are quite readable:

Extraction ilist.app.(** val app :
    nat -> 'a1 ilist -> nat -> 'a1 ilist -> 'a1 ilist **)

let rec app _ ls1 n2 ls2 =
  match ls1 with
  | Nil -> ls2
  | Cons (n, x, ls1') ->
    Cons ((add n n2), x, (app n ls1' n2 ls2))

Side note: ilist without the `n`

Looking at extracted definitions, one may wonder why we have to carry the n (the length of the vector) in each cons at runtime. The answer is simple: n is of type nat and nat is not a Prop, so Coq does not erase it.

This is fortunate, because one might write this:

Definition ilength {A n} (l: ilist.ilist A n): nat := n.
Extraction ilength.(** val ilength :
    nat -> 'a1 Coq_ilist.ilist -> nat **)

let ilength n _ =
  n

The One Rule of Dependent Pattern Matching in Coq

The rest of this chapter will demonstrate a few other elegant applications of dependent types in Coq. Readers encountering such ideas for the first time often feel overwhelmed, concluding that there is some magic at work whereby Coq sometimes solves the halting problem for the programmer and sometimes does not, applying automated program understanding in a way far beyond what is found in conventional languages. The point of this section is to cut off that sort of thinking right now! Dependent type-checking in Coq follows just a few algorithmic rules, with just one for dependent pattern matching of the kind we met in the previous section.

A dependent pattern match is a match expression where the type of the overall match is a function of the value and/or the type of the discriminee, the value being matched on. In other words, the match type depends on the discriminee.

When exactly will Coq accept a dependent pattern match as well-typed? Some other dependently typed languages employ fancy decision procedures to determine when programs satisfy their very expressive types. The situation in Coq is just the opposite. Only very straightforward symbolic rules are applied. Such a design choice has its drawbacks, as it forces programmers to do more work to convince the type checker of program validity. However, the great advantage of a simple type checking algorithm is that its action on invalid programs is easier to understand!

We come now to the one rule of dependent pattern matching in Coq. A general dependent pattern match assumes this form (with unnecessary parentheses included to make the syntax easier to parse):

match E as y in (T x1 ... xn) return U with
  | C z1 ... zm => B
  | ...
end

The discriminee is a term E, a value in some inductive type family T, which takes n arguments. An as clause binds the name y to refer to the discriminee E. An in clause binds an explicit name xi for the i`th argument passed to `T in the type of E.

We bind these new variables y and xi so that they may be referred to in U, a type given in the return clause. The overall type of the match will be U, with E substituted for y, and with each xi substituted by the actual argument appearing in that position within E's type.

In general, each case of a match may have a pattern built up in several layers from the constructors of various inductive type families. To keep this exposition simple, we will focus on patterns that are just single applications of inductive type constructors to lists of variables. Coq actually compiles the more general kind of pattern matching into this more restricted kind automatically, so understanding the typing of match requires understanding the typing of matches lowered to match one constructor at a time.

The last piece of the typing rule tells how to type-check a match case. A generic constructor application C z1 ... zm has some type T x1' ... xn', an application of the type family used in E's type, probably with occurrences of the zi variables. From here, a simple recipe determines what type we will require for the case body B. The type of B should be U with the following two substitutions applied: we replace y (the as clause variable) with C z1 ... zm, and we replace each xi (the in clause variables) with xi'. In other words, we specialize the result type based on what we learn from which pattern has matched the discriminee.

This is an exhaustive description of the ways to specify how to take advantage of which pattern has matched! No other mechanisms come into play. For instance, there is no way to specify that the types of certain free variables should be refined based on which pattern has matched.

A few details have been omitted above. Inductive type families may have both parameters and regular arguments. Within an in clause, a parameter position must have the wildcard _ written, instead of a variable. (In general, Coq uses wildcard _'s either to indicate pattern variables that will not be mentioned again or to indicate positions where we would like type inference to infer the appropriate terms.) Furthermore, recent Coq versions are adding more and more heuristics to infer dependent match annotations in certain conditions. The general annotation-inference problem is undecidable, so there will always be serious limitations on how much work these heuristics can do. When in doubt about why a particular dependent match is failing to type-check, add an explicit return annotation! At that point, the mechanical rule sketched in this section will provide a complete account of "what the type checker is thinking." Be sure to avoid the common pitfall of writing a return annotation that does not mention any variables bound by in or as; such a match will never refine typing requirements based on which pattern has matched. (One simple exception to this rule is that, when the discriminee is a variable, that same variable may be treated as if it were repeated as an as clause.)

Lecture 18: Dependent types

A motivating example

Length-Indexed Lists

Side note: ilist without the n

The One Rule of Dependent Pattern Matching in Coq

Side note: ilist without the `n`