Lecture 4:  Polymorphic Datatypes

Review

Our overview of SML types can be summarized by the following BNF grammar definitions:

(base types) b ::= int | real | string | bool | char
(types)      t ::= b | t1 -> t2 | (t1 * ... * tn) | unit |
                   {id1:t1,...,idn:tn} | id

Recall that we read "::=" as "is defined to be" and "|" as "or".  So a base type is defined to be either int or real or string or bool or char.   A type is defined to be either a base type, a function type such as int -> real or (int * int) -> string, a tuple type such as (int * int) or (bool * real * string) or the distinguished empty-tuple type unit, a record type such as {name:string, age:int}, or a datatype name such as day or nat that we have defined with a datatype declaration.

The constants, expressions, declarations, and patterns of SML can be summarized by the following BNF definitions:

(const.) c ::= 0 | 1 | ~1 | ... | 3.1415 | "hello" | true | false | ...
(exprs.) e ::= id | id.id | c | (fn (p) => e) | e1(e2) | (e1,...,en) | 
               #i e | {id1=e1,...,idn=en} | #id e | 
               let d1 ... dn in e end | if e1 then e2 else e3 |
               (case e of p1 => e1 | ... | pn => en) | (e1;...;en)
(decls.) d ::= val p = e | fun id(p):t = e | 
               datatype id = id1 of t1 | ... | idn of tn
(pats.)  p ::= _ | id:t | c | id(p) | (p1,...,pn) | {id1=p1,...,idn=pn}

Expressions:

For each base type, we have a corresponding set of constants.  For instance, for the base type int, we have constants of the form 0, 1, ~1, 2, ~2, etc.

We can use identifiers such as x or foo or name in expressions as long as they are bound (i.e., given a value with a declaration or as parameters to a function or as appropriate pattern variables.)   We can use library functions with the "dot" notation, such as Int.toString or Math.sqrt.

We can create anoynmous functions using the expression form (fn (p) => e) where p is a pattern (usually a variable or a tuple pattern), and e is the body of the function.  For instance, the anonymous function (fn (x:int) => x+1) takes in an integer (x) and returns x+1.   We can also declare functions either at the top-level (using the fun declaration) or within a let-expression.  Functions are used by applying them to an argument, written e1(e2) where e1 is the function and e2 is the argument.  Usually, e1 will be an identifier, but in general, it can be any expression that evaluates to a function.

We create tuples using the form (e1,...,en) and extract the ith component of a tuple using #i e.  So, for instance, (3,true) creates a tuple of type (int * bool).  Evaluating #1 (3,true) returns the integer 3 whereas evaluating #2 (3,true) returns the boolean true. The empty tuple () has a special  type called unit.

We create records using the form {id1 = e1,...,idn=en} and extract the component labeled with id using #id e.  So, for instance, {name="Greg", age=120} creates a record of type {name:string, age:int}.   Evaluating #name {name="Greg", age=120} evaluates to the string "Greg" and evaluating #age {name="Greg", age=120} evaluates to the integer 120.

We use expressions of the form let d1 ... dn in e end when we need local declarations d1,...,dn to evaluate some expression e.  For example, the following code declares pi as a local variable and square as a local function:

	let val pi = 3.14159
	    fun square(x:real):real = x * x
	in
	    square(pi) * 10
	end

We use if-expressions to test boolean values and return the result of one expression or another.

A case-expression generalizes if in that it allows pattern matching on other constants besides booleans (such as constant integers or strings), as well as a way to deconstruct datatype values.  We'll see lots of examples of case-expressions below.

When we write (e1; ... ; en), SML evaluates e1 first, throws away the value, then evaluates the next expression and so forth, finally returning the result of en as the value of the whole expression.  These expression forms are useful for doing side-effects such as printing things.  For example, the following code takes in a record consisting of a first name, a last name, and an age and prints out the information:

	fun print_record(r:{first:string,last:string,age:int}):unit = 
	    (print (#first r);
	     print (" ");
	     print (#last r);
	     print ("'s age is ");
	     print (Int.toString(#age r));
	     print ("\n"))

Notice that print takes a string and produces unit (i.e., the empty tuple ()) as its result.  Since the last expression in the sequence is a call to print the result of print_record is also unit. 

Declarations:

Declarations are either value declarations, function declarations, or datatype declarations.  Value declarations are used to bind values to identifiers.   For instance, the declaration val pi:real = 3.14159 binds the real number 3.14159 to the identifier pi.  In general, we can use a pattern in a value declaration to deconstruct a value.  For instance, the value declaration val (x:int,y:int) = (3,4) binds x to the first component of the tuple and y to the second component of the tuple.   This is the same as writing two declarations:  val x:int = #1 (3,4) followed by val y:int = #2 (3,4).  

Function declarations define new functions and bind them to an identifier.   We've seen lots of examples of function declarations and we'll see a lot more below.  Notice that in general, the argument to a function can be deconstructed with a pattern just as a value can be deconstructed in a value declaration.

Datatype declarations define two things:  a new type and a set of data constructors.  The data constructors are not types, rather you can think of them as functions which create values of the new type.  For example the declaration:

	datatype foo = A of int | B of (string * real) | C | D

defines a new type foo.  It also defines four data constructors A, B, C, and D.  The only way to create values of type foo is to use these data constructors in an expression.  Furthermore, the constructors A and B require arguments in order to create values of type foo.  So, for instance, all of the following expressions have type foo:

	C
	D
	A(3)
	B("bzzz",2.178)

Notice that since neither the C nor the D data constructors has "of t" written after it, they take no arguments.

Patterns:

Patterns are used in value declarations, in function arguments, and in case-expressions to deconstruct values.  The wild-card pattern (an underscore _) matches any value.   A variable pattern that is not a data constructor also matches any value, but it also binds the value to the pattern.  A variable pattern that is a data constructor from some datatype definition matches only that data constructor.  A tuple pattern (p1,...,pn) matches any tuple value as long as the components of the tuple match the nested patterns.   Similarly, a record pattern {id1=p1,...,idn=pn} matches any record as long as the components of the record match the appropriate nested patterns.

We'll see lots of examples of patterns in the code below.


Polymorphic Lists

<% ShowSMLFile("lec04.sml") %>