Subset types and their relatives help us integrate verification with programming. But we have only scratched the tip of the iceberg that is Coq's mechanism for defining inductive types.
Consider the problem of defining a type of combinational circuits. With the tools we previously studied, we might chose to use a simple inductive type and an interpreter to give the semantics:
Module SimpleTypes. Inductive circuit: Set := | ReadRegister (idx: nat) (* Variables *) | Constant (bs: list bool) | And (c0 c1: circuit) | Not (c: circuit) | Firstn (c: circuit) (len: nat) | Skipn (c: circuit) (idx: nat) | Append (c0 c1: circuit) | Mux (c c0 c1: circuit). Fixpoint simulate (regs: nat -> option (list bool)) (c: circuit) : list bool := match c with | ReadRegister idx => match regs idx with | Some bs => bs | None => [] end | Constant bs => bs | And c0 c1 => List.map (fun '(b0, b1) => andb b0 b1) (List.combine (simulate regs c0) (simulate regs c1)) | Not c => List.map negb (simulate regs c) | Firstn c len => List.firstn len (simulate regs c) | Skipn c idx => List.skipn idx (simulate regs c) | Append c0 c1 => simulate regs c0 ++ simulate regs c1 | Mux c c0 c1 => match simulate regs c with | [] => [] | b :: _ => simulate regs (if b then c0 else c1) end end.
The resulting simulator can be run like this:
Definition regs idx := match idx with | 0 => Some [false;true;false] | 1 => Some [true] | _ => None end.End SimpleTypes.
Unfortunately, this specification is not very readable: we need a default case for undefined registers; the And
operation truncates if an argument is longer than the other; Firstn
may return less than n
bits, and Skipn
may run out of bits before skipping as many as requested, etc.
In contrast, if we could ensure that circuits all have the expected bitwidth and that all register references are in bounds, we could eliminate all the unnecessary cruft. With dependent types, we can do just that. Vector.t A n
is a list of exactly n
elements of type A
; and Fin.t n
is a number < n
.
Module DependentTypes. Section DependentTypes. Context {nregs: nat} {reg_widths: Vector.t nat nregs}. Import VectorDef.VectorNotations. Inductive circuit : forall (w:nat), Set := | ReadRegister (idx: Fin.t nregs) : circuit reg_widths[@idx] | Constant {w} (bs: Vector.t bool w) : circuit w | And {w} (c0 c1: circuit w) : circuit w | Not {w} (c: circuit w) : circuit w | Firstn {w} (c: circuit w) (len: Fin.t (S w)) : circuit (proj1_sig (Fin.to_nat len)) | Skipn {w} (c: circuit w) (idx: Fin.t (S w)) : circuit (w - (proj1_sig (Fin.to_nat idx))) | Append {w0 w1} (c0: circuit w0) (c1: circuit w1) : circuit (w0 + w1) | Mux {w} (c: circuit 1) (c0 c1: circuit w) : circuit w. Fixpoint simulate {w} (regs: forall (idx: Fin.t nregs), Vector.t bool reg_widths[@idx]) (c: circuit w) : Vector.t bool w := match c with | ReadRegister idx => regs idx | Constant bs => bs | And c0 c1 => Vector.map2 (fun b0 b1 => andb b0 b1) (simulate regs c0) (simulate regs c1) | Not c => Vector.map negb (simulate regs c) | Firstn c len => firstn len (simulate regs c) | Skipn c idx => skipn idx (simulate regs c) | Append c0 c1 => (simulate regs c0) ++ (simulate regs c1) | Mux c c0 c1 => simulate regs (if Vector.hd (simulate regs c) then c0 else c1) end. End DependentTypes. End DependentTypes.
Many introductions to dependent types start out by showing how to use them to eliminate array bounds checks. When the type of an array tells you how many elements it has, your compiler can detect out-of-bounds dereferences statically. Since we are working in a pure functional language, the next best thing is length-indexed lists, which the following code defines.
Module ilist. Section ilist. Context {A : Set}.
Note how now we are sure to write out the type of each constructor in full,
instead of using the shorthand notation we favored previously. The reason
is that now the index to the inductive type ilist
depends on details of a
constructor's arguments. We are also using Set
, the type containing the
normal types of programming.
Inductive ilist : nat -> Set := | Nil : ilist O | Cons : forall {n}, A -> ilist n -> ilist (S n).
We see that, within its section, ilist
is given type nat -> Set
.
Previously, every inductive type we have seen has either had plain Set
as
its type or has been a predicate with some type ending in Prop
. The full
generality of inductive definitions lets us integrate the expressivity of
predicates directly into our normal programming.
The nat
argument to ilist
tells us the length of the list. The types
of ilist
's constructors tell us that a Nil
list has length O
and that
a Cons
list has length one greater than the length of its tail. We may
apply ilist
to any natural number, even natural numbers that are only
known at runtime. It is this breaking of the phase distinction that
characterizes ilist
as dependently typed.
In expositions of list types, we usually see the length function defined first, but here that would not be a very productive function to code. Instead, let us implement list concatenation.
Past Coq versions signalled an error for this definition. The code is
still invalid within Coq's core language, but current Coq versions
automatically add annotations to the original program, producing a valid
core program. These are the annotations on match
discriminees that we
began to study with subset types. We can rewrite app
to give the
annotations explicitly.
Fixpoint app {n1} (ls1 : ilist n1) {n2} (ls2 : ilist n2) : ilist (n1 + n2) := match ls1 in (ilist n1) return (ilist (n1 + n2)) with | Nil => ls2 | Cons x ls1' => Cons x (app ls1' ls2) end.
Using return
alone allowed us to express a dependency of the match
result type on the value of the discriminee. What in
adds to our
arsenal is a way of expressing a dependency on the type of the
discriminee. Specifically, the n1
in the in
clause above is a
binding occurrence whose scope is the return
clause.
We may use in
clauses only to bind names for the arguments of an
inductive type family. That is, each in
clause must be an inductive type
family name applied to a sequence of underscores and variable names of the
proper length. The positions for parameters to the type family must all
be underscores. Parameters are those arguments declared with section
variables or with entries to the left of the first colon in an inductive
definition. They cannot vary depending on which constructor was used to
build the discriminee, so Coq prohibits pointless matches on them. It is
those arguments defined in the type to the right of the colon that we may
name with in
clauses.
Here's a useful function with a surprisingly subtle type, where the return type depends on the value of the argument.
Fixpoint inject (ls : list A) : ilist (length ls) := match ls with | nil => Nil | h :: t => Cons h (inject t) end.
We can define an inverse conversion and prove that it really is an inverse.
Fixpoint unject {n} (ls : ilist n) : list A := match ls with | Nil => nil | Cons h t => h :: unject t end.A: Setforall ls : list A, unject (inject ls) = lsinduction ls; simpl; congruence. Qed.A: Setforall ls : list A, unject (inject ls) = ls
Now let us attempt a function that is surprisingly tricky to write. In ML,
the list head function raises an exception when passed an empty list. With
length-indexed lists, we can rule out such invalid calls statically, and
here is a first attempt at doing so. We write _
for a term that we wish
Coq would fill in for us, but we'll have no such luck.
It is not clear what to write for the Nil
case, so we are stuck before we
even turn our function over to the type checker. We could try omitting the
Nil
case.
Actually, these days, Coq is smart enough to make that definition work!
However, it will be educational to look at how Coq elaborates this code
into its core language, where, unlike in ML, all pattern matching must be
exhaustive. We might try using an in
clause somehow.
Due to some relatively new heuristics, Coq does accept this code, but in
general it is not legal to write arbitrary patterns for the arguments of
inductive types in in
clauses. Only variables are permitted there, in
Coq's core language. A completely general mechanism could only be
supported with a solution to the problem of higher-order unification, which
is undecidable.
Our final, working attempt at hd
uses an auxiliary function and a
surprising return
annotation.
Definition hd' {n} (ls : ilist n) := match ls in (ilist n) return (match n with O => unit | S _ => A end) with | Nil => tt | Cons h _ => h end.Definition hd {n} (ls : ilist (S n)) : A := hd' ls.
We annotate our main match
with a type that is itself a match
. We
write that the function hd'
returns unit
when the list is empty and
returns the carried type A
in all other cases. In the definition of hd
,
we just call hd'
. Because the index of ls
is known to be nonzero, the
type checker reduces the match
in the type of hd'
to A
.
In fact, when we "got lucky" earlier with Coq accepting simpler definitions, under the hood it was desugaring almost to this one.
Definition easy_hd {n} (ls : ilist (S n)) : A := match ls with | Cons h _ => h end.End ilist. Arguments ilist A n : clear implicits. End ilist.
Functions on ilist
can be extracted, and are quite readable:
n
Looking at extracted definitions, one may wonder why we have to carry the n
(the length of the vector) in each cons
at runtime. The answer is simple: n
is of type nat
and nat
is not a Prop
, so Coq does not erase it.
This is fortunate, because one might write this:
Definition ilength {A n} (l: ilist.ilist A n): nat := n.
The rest of this chapter will demonstrate a few other elegant applications of dependent types in Coq. Readers encountering such ideas for the first time often feel overwhelmed, concluding that there is some magic at work whereby Coq sometimes solves the halting problem for the programmer and sometimes does not, applying automated program understanding in a way far beyond what is found in conventional languages. The point of this section is to cut off that sort of thinking right now! Dependent type-checking in Coq follows just a few algorithmic rules, with just one for dependent pattern matching of the kind we met in the previous section.
A dependent pattern match is a match
expression where the type of the
overall match
is a function of the value and/or the type of the
discriminee, the value being matched on. In other words, the match
type
depends on the discriminee.
When exactly will Coq accept a dependent pattern match as well-typed? Some other dependently typed languages employ fancy decision procedures to determine when programs satisfy their very expressive types. The situation in Coq is just the opposite. Only very straightforward symbolic rules are applied. Such a design choice has its drawbacks, as it forces programmers to do more work to convince the type checker of program validity. However, the great advantage of a simple type checking algorithm is that its action on invalid programs is easier to understand!
We come now to the one rule of dependent pattern matching in Coq. A general dependent pattern match assumes this form (with unnecessary parentheses included to make the syntax easier to parse):
match E as y in (T x1 ... xn) return U with | C z1 ... zm => B | ... end
The discriminee is a term E
, a value in some inductive type family T
,
which takes n
arguments. An as
clause binds the name y
to refer to the
discriminee E
. An in
clause binds an explicit name xi
for the i`th
argument passed to `T
in the type of E
.
We bind these new variables y
and xi
so that they may be referred to in
U
, a type given in the return
clause. The overall type of the match
will be U
, with E
substituted for y
, and with each xi
substituted by
the actual argument appearing in that position within E
's type.
In general, each case of a match
may have a pattern built up in several
layers from the constructors of various inductive type families. To keep
this exposition simple, we will focus on patterns that are just single
applications of inductive type constructors to lists of variables. Coq
actually compiles the more general kind of pattern matching into this more
restricted kind automatically, so understanding the typing of match
requires understanding the typing of match
es lowered to match one
constructor at a time.
The last piece of the typing rule tells how to type-check a match
case. A
generic constructor application C z1 ... zm
has some type T x1' ... xn'
,
an application of the type family used in E
's type, probably with
occurrences of the zi
variables. From here, a simple recipe determines
what type we will require for the case body B
. The type of B
should be
U
with the following two substitutions applied: we replace y
(the as
clause variable) with C z1 ... zm
, and we replace each xi
(the in
clause variables) with xi'
. In other words, we specialize the result type
based on what we learn from which pattern has matched the discriminee.
This is an exhaustive description of the ways to specify how to take advantage of which pattern has matched! No other mechanisms come into play. For instance, there is no way to specify that the types of certain free variables should be refined based on which pattern has matched.
A few details have been omitted above. Inductive type families may have both
parameters and regular arguments. Within an in
clause, a parameter
position must have the wildcard _
written, instead of a variable. (In
general, Coq uses wildcard _
's either to indicate pattern variables that
will not be mentioned again or to indicate positions where we would like type
inference to infer the appropriate terms.) Furthermore, recent Coq versions
are adding more and more heuristics to infer dependent match
annotations in
certain conditions. The general annotation-inference problem is undecidable,
so there will always be serious limitations on how much work these heuristics
can do. When in doubt about why a particular dependent match
is failing to
type-check, add an explicit return
annotation! At that point, the
mechanical rule sketched in this section will provide a complete account of
"what the type checker is thinking." Be sure to avoid the common pitfall of
writing a return
annotation that does not mention any variables bound by
in
or as
; such a match
will never refine typing requirements based on
which pattern has matched. (One simple exception to this rule is that, when
the discriminee is a variable, that same variable may be treated as if it
were repeated as an as
clause.)