Recitation 1: Basic types and expressions

NOTES TO INSTRUCTOR:


Starting SML/NJ

From the command line (the Unix shell or Windows console), type sml. If home install, make sure the execution path has been set up properly (automatic after reboot with self-installer on Windows, not so with RPM under Linux).

Under Windows, in the start menu, click on /Programs/Standard ML of New Jersey/Standard ML of New Jersey (assuming standard install).

Either way, you get a banner identifying the compiler, and a prompt "-", which indicates that the compiler is ready to accept expressions, compile them and execute them. Understanding expressions is the focus of this lecture.

Knowing how to start the compiler, it is useful to also know how to exit the compiler. The easiest way under Unix is to press Ctrl-D at the prompt, and under Windows to press Ctrl-Z followed by Return.

TIP: The compiler is very picky about some things when reading input, and will only accept a request to exit if it is sitting exactly at the "-" prompt, i.e. if nothing else has been already typed in. In case of doubt, press Ctrl-C to interrupt and reset the prompt to a sane state. In general, whenever the input is behaving strangely (this is often due to using the wrong syntax that the compiler is trying on-the-fly to correct), pressing CTRL-C should return the prompt to a sane state.


Basic expressions and types

Standard ML (SML) is an expression-based language. Imagine a gigantic calculator. Every expression "evaluates" to a value. Everything is done through evaluating expressions. SML provides good ways to build larger and larger expressions that can still be understood. And provides good ways to write larger and larger expressions that are correct.

For real applications, programming with expressions may seem like a stretch. After all, applications often do not "compute" anything, they allow you to "do" something. Later we introduce the notion of side-effects: evaluate this, and oh, btw, do *that* while you're at it. Emphasis on "do": Imperative. For example, evaluate this expression, and as a side-effect put up this window on the screen. We will return to side-effects later in the course. For now, we concentrate on expressions.

Expressions evaluate to values. Values can be classified according to their type:

Values of type:

int:    0,1,2,3,4,~1,~2,~3,~4,... (notice the negative sign is ~)
bool:    true, false
real:    3.141592 (aka floating point numbers)
string:    "this is a string"
char:    #"a"

Let us start by looking at simple expressions, with the following syntax (expressed in BNF form, which should be clear)

NOTE TO INSTRUCTOR: The BNF for expressions, and later for declarations, should be put on one side of the board, and kept there as we will be adding expression types and declaration types as the lecture goes.

e ::= c  |  unop e  |  e1 binop e2  |  if e1 then e2 else e3  |  (e)

where c are constants (the values described above), unop are unary operations, binop are binary operations.

Unary operations include:

~    (takes an int or a real and negates it)
not    (takes a bool and returns its negation)
size    (takes a string and returns its size)

Binary binops include:

+,-,*    (take two ints or two reals and return corresponding result)
div,mod    (take two ints and returns integer quotient and remainder)
>,>=,<,<=,=    (take two ints or two reals and compare)
^    (takes two strings and concatenate into a new string)

Evaluation rules specify how an expression is to be evaluated. Constants simply evaluate to their value. Unary and binary operations first evaluate their arguments to values, and then perform the operation. A conditional if e1 then e2 else e3 evaluates e1 to a value: if it is true, e2 is evaluated, if it is false, e3 is evaluated.

Evaluation only makes sense if the types agree! the + operation is defined if both operands are integers or reals, but adding an integer to a boolean is senseless. Similarly, the condition of an if expression better be a boolean. Type checking is performed to ensure we are only doing sensible things. As we saw, every value has a type. We can extend this definition to expressions: every expression has a type, the type of the value the expression evaluates to. How do we figure out the type of an expression?

Every constant has a type (42 has type int, true has type bool, etc), so that's easy. Operations also have type (given informally above, formally here):

 ~    :  int -> int      real -> real
 not  :  bool -> bool
 size :  string -> int
 +    :  (int * int) -> int    (real * real) -> real
 >    :  (int * int) -> bool   (real * real) -> bool
 ^    :  (string * string) -> string

Every operation specifies the type of values it expects and the type of value it returns. Notice the * character when more than one argument is expected. Thus, for an expression unop e, if e has type t1 and the unop has type t1 -> t2, then unop e has type t2. For an expression e1 binop e2, if e1 has type t1, e2 has type t2 and the binop has type (t1 * t2) -> t3, then e1 binop e2 has type t3.

For a conditional if e1 then e2 else e3, we need to make sure that e1 is of type bool, and then if e2 and e3 both have the same type t, then the conditional itself has type t. Why do we need e2 and e3 to have the same type? Recall we are doing this at compile-time (figuring out the types), that means that we have to give a type to the conditional without executing the conditional, that is without executing the test to see which branch is taken. So either branch could be taken really. This is only safe if both branch return the same type of value.

Notice that in the above description, we have "rules" of the form "if e has type t1 and the unop has type t1 -> t2, then unop e1 has type t2". What happens if the conditions are not satisfied? Then it is not possible to give a type to unop e1. It does not type check, and the compiler rejects the program. In other words, there is a conflict, for example we are trying to pass a boolean to ~ which negates integers or reals. Our rules prevent this from happening, since often such errors cause run-time crashes.

Now that we know how to write expressions, type check and evaluate expressions, how do we tell the compiler to evaluate expressions at the prompt (also called top level)? You type it in (possibly on multiple lines), then you type ; and press Return. Note that the ; is *not* part of the expression to be evaluated. It is simply an indication to the compiler that you have finished typing in the expression and that you are ready to evaluate. Before evaluating the expression, the compiler will type check it to make sure the types agree, and then execute it.

TIP: When entering expression on multiple lines, lines after the first get a "=" prompt. Sometimes the compiler gets confused. Press CTRL-C to get back a "-" prompt.


Declarations

At top level, you can also "name" values. This is not a form of expression, but rather an indication to the compiler that you are defining something new, a declaration. Syntax for declarations:

d ::= val id = e

The idea is that val id = e indicates to the compiler: evaluate expression e to a value and bind the result to the name id. In subsequent expressions, id can be used to refer to that value. We therefore extend our syntax for expressions with:

e ::= ...  |  id

And evaluating id just means to lookup what value it was bound to. The type of id is the type of the value it is bound to.

TIP: Actually, even expressions typed at the top level are treated as declarations.
  - e;
is treated as
  - val it = e;
so that e gets evaluated and bound to the identifier it, so that you can always refer to the last expression evaluated by this identifier.

Bindings at top level are valid until another binding of the same name is done (which shadows the previous binding). We can also introduce local bindings valid for the evaluation of an expression.

e ::= ...  |  let d in e end

To evaluate a let expression, you evaluate and bind the expression in the declaration d, and evaluate e taking that binding into account.

QUESTION: what is the type of let d in e end?

Simply the type of e, since the type of an expression is the type of the result.

At this point, we can create huge expressions, but can't easily reuse expressions. Thus, we introduce functions. A function declaration is a new kind of declaration:

d ::= ...  |  fun id (x1:t1, ..., xn:tn):t = e

For example, fun square (x:int):int = x * x. (If entered at top level, recall to type ; to indicate to the compiler that you're done)

Functions have types, similar to operation types

square : int -> int

How do we use functions? We introduce a new kind of expression called "function application":

e ::= ...  |  id (e1,...,en)

How do we type-check a function application?

  1. If f has type (t1 * ... * tn) -> t
  2. If the number of arguments supplied to the function is the same as the number of arguments the function expects, and
  3. If e1 has type t1, ... en has type tn, then f (e1,...,en) has type t.

So square (10) type checks, but square (true) does not.

How do we evaluate a function application? Evaluate e1,...,en to values v1,...,vn, then evaluate e (the body of f) with x1 bound to v1, ..., xn bound to vn.

So square (10+10) --> square (20) --> 400.

You'll see all of this formally in a week. This is just to give you an intuition, at least in the simple cases.

Since function declarations are declarations, they can be declared within a let expression, to get local function (like Pascal, unlike C)

For example,

fun fourth (y:int):int = 
  let 
    fun square (x:int):int = x * x
  in
    square (square (y))
  end

TIP: Even though in this course we will always annotate functions arguments and function results with type information, it is not formally necessary. SML can most of the time infer the type of a function from its definition. For example, you can define the function: fun inc (x) = x + 1 and the compiler can infer the type inc : int -> int Intuitively: since 1 is an integer, and + takes two integers, x must be an integer, and thus x+1 must be an integer as well.

PITFALL: It is *very* good policy to always annotate functions. As you will notice quickly if you do this often, it can be extremely hard to debug a program that does not type check while trying to figure out why the compiler inferred this and this type for such and such an expression. Annotations will catch type errors more quickly.

PITFALL: Type inference and overloading don't mix well. The compiler trying to infer a type for fun add (x,y) = x + y will return the type (int * int) -> int while you may wish for it to be of type (real * real) -> real. Solution: use type annotations on functions!

TIP: It can be useful to put declarations into a file so that you're not caught entering them over and over again at the prompt. Just use an editor to edit a file, and you can load it into SML/NJ with the operation use "file" which behaves as though the declarations have been entered at the prompt (since this evaluates expressions and declarations like they've been entered at the prompt, don't forget the semicolons to indicate where the expressions and declarations are done!) The big question is where should the file be stored. The operation use by default looks for files in the current working directory. This can be different on different systems and different installations. The "magic invocation":
  - OS.FileSys.getDir ();
will return the current working directory. To change it, use the "magic invocation":
  - OS.FileSys.chDir "path"; where path is the path where you want to go to. Use the Unix convention (even on Windows): the path separator is "/". Soon in the course, we will see a *much better* way of managing programs than use, which is really just a lazy person's way of using the top level prompt.