We will use the Standard ML (SML) programming language this semester. SML is a functional language rather than a procedural language; the key difference between these two classes of languages is the execution model---the way in which programs are executed. Procedural (or imperative) languages, such as C and Java, are based on commands that change the state of the machine on which they execute. For example, the assignment statement (in both C and Java) explicitly changes the value stored in some memory location. Functional languages, in contrast, are based on evaluating expressions to produce values. SML provides good ways to build correct, understandable, and large expressions. The focus of this lecture is understanding expressions.
For real applications, programming with expressions may seem like a stretch. After all, applications often do not "compute" anything; instead, they allow you to "do" something. Later in the course, we will introduce the notion of side-effects: evaluate this, and do that while you're at it---e.g., evaluate an expression, and (as a side-effect) display a window on the screen. But for the moment, we will live in a pure world, one without side-effects.
To start learning SML, you should begin playing with the SML language by writing toy expressions. Like most programming languages, SML has a compiler that can be run on source files to produce object files. But to really understand expressions, there is a better way to interact with the SML compiler, called the SML interactive system. This system is something like a very fancy calculator---you interact with it by typing in an expression, and it responds by evaluating that expression and telling you the resulting value.
Starting the SML interactive system. There are several implementations of SML, and the one we use in CS312 is SML/NJ. Start an interactive session with SML/NJ by typing sml at the command line (in a Unix shell or Windows console), or by clicking on Start/Programs/SML of New Jersey (in Windows). You will get a top level prompt "-", which indicates that the compiler is ready to accept expressions and evaluate them. To exit the interactive system, enter the end-of-file character for your platform (Ctrl-D on Unix, Ctrl-Z + Return on Windows).
Using the SML interactive system. Type in an expression (possibly on multiple lines, which after the first line will have a secondary prompt "="), then type ; and press Return. Note that the ; is not part of the expression to be evaluated. It is simply an indication to the compiler that you have finished typing in the expression and that you are ready to evaluate. Before evaluating the expression, the compiler will type check it.
TIP: The interactive system is very picky about some things when reading input, and will accept a request to exit only if it is sitting exactly at the top level prompt--- i.e. if nothing else has been already typed in. When in doubt (especially if input seems to be behaving strangely), press Ctrl-C to interrupt and reset the prompt to a sane state.
TIP: It can be useful to put expressions into a file so that you're
not stuck entering them over and over again at the prompt. Just use an editor
to edit a file, and you can load it into SML with the operation use
"file"; which behaves as though the expressions have been entered
at the top level prompt (don't forget the semicolons!). The big question is where should the
file be stored. Operation use by default looks for files in the
current working directory. The "magic invocation":
will return the current working directory. To change it, use the "magic invocation":
where path is the path where you want to go to. Use the Unix convention (even on Windows): the path separator is "/".
TIP: Operation use is really just a lazy person's way of using the top level prompt. Soon in the course, we will see a much better way of managing programs.
Expressions evaluate to values. Values can be classified according to their type:
|int||0,1,2,~1,~2 (~ is the negative sign)|
Let us start looking at simple expressions, with the following syntax:
e ::= c | unop e | e1 binop e2 | if e1 then e2 else e3 | (e)
where c are constants (the values described above), unop are unary operations, binop are binary operations. We expressed this syntax using Backus-Naur Form (BNF), a common notation for the grammar of computer languages. NOTE TO INSTRUCTOR: The BNF for expressions, and later for declarations, should be put on one side of the board, and kept there as we will be adding expression types and declaration types as the lecture goes.
Unary operators include:
|~||take an int (or a real) and return its negation|
|not||take a bool and return its negation|
|size||take a string and return its size|
Binary operators include:
|+,-,*||take two ints (or two reals) and return their sum, difference, or product|
|div,mod||take two ints and return their integer quotient or remainder|
|>,>=,<,<=||take two ints (or two reals) and return their comparison|
|=||take two values and return whether they are equal|
|^||take two strings and return their concatenation into a new string|
Expressions are transformed into values by the application of evaluation rules. The evaluation rules for simple expressions are:
Evaluation of operators only makes sense if the types of the operands agree. For example, the + operator is defined if both operands are integers or reals, but adding an integer to a boolean is meaningless. So type checking is performed to ensure that expressions are meaningful, and this requires giving a type to every expression. How do we determine the type of an expression?
Every constant has a type (42 has type int, true has type bool, etc). Operators also have types (which we gave informally above); in SML syntax, these types are:
~ : int -> int real -> real not : bool -> bool size : string -> int + : (int * int) -> int (real * real) -> real > : (int * int) -> bool (real * real) -> bool ^ : (string * string) -> string
Every operator specifies the types of values it takes and the type of the value it returns. Notice the * when more than one argument is expected.
Now we can give rules for determining the types of expressions. The principle embodied in these rules is:
The type of an expression is always the type of its result.
If the conditions in these rules are not satisfied by an expression, then it is not possible to give a type to that expression. It does not type check, and the compiler rejects it as a program. This prevents the evaluation of meaningless expressions. It is important to do so, because such programming errors could otherwise cause run-time crashes.
Value can be given names. This is not a form of expression, but rather an indication to the compiler that you are declaring a new name.
d ::= val id = e
For example, val pi = 3.141592 is a declaration. A declaration indicates to the compiler to evaluate expression e, produce a value, and bind that value to the name id. In subsequent expressions, id can be used to refer to that value. We therefore extend our syntax for expressions with:
e ::= ... | id
Evaluating id means to lookup what value it is bound to. The type of id is the type of the value which it is bound to.
TIP: In the SML/NJ interactive system, expressions typed at the top level are treated as definitions.
is treated as
- val it = e;
so that e gets evaluated and bound to the identifier it, meaning you can always refer to the last expression evaluated.
Declarations last until another declaration of the same name is encountered (which shadows---i.e., replaces---the previous declaration). We can also introduce local declarations that last only for the evaluation of an expression.
e ::= ... | let d in e end
To evaluate a let expression, declaration d is evaluated, then e is evaluated, taking the binding from the declaration into account.
QUESTION: what is the type of let d in e end? Answer: The type of e, since the type of an expression is the type of the result.
At this point, we can write large expressions, but we can't easily reuse expressions. Thus, we introduce functions. A function declaration is a new form of declaration:
d ::= ... | fun id (x1:t1, ..., xn:tn):t = e
In this new form of declaration, e is called the body of function id. For example, fun square (x:int):int = x * x is a function definition, and its body is x * x. Functions have types that are similar to the types of operators. For example, the type of square is int -> int.
How do we use functions? We introduce a new expression called function application:
e ::= ... | id (e1,...,em)
How do we type check a function application f (e1,...,em)? As follows:
So square (10) type checks, but square (true) does not.
How is function application f (e1,...,em) evaluated? Evaluate e1,...,em to values v1,...,vm, then evaluate the body of f with x1 bound to v1, ..., xm bound to vm. So square (10+10) --> square (20) --> 20*20 --> 400. We'll see many more details on evaluation in a week. For now, we just want to give you an intuition about the simple cases.
Function bindings can be used within a let expression to obtain a local function (like Pascal, unlike C). For example:
fun fourth (y:int):int = let fun square (x:int):int = x * x in square (square (y)) end
TIP: It is not strictly necessary to annotate functions with types. Most of the time, SML can time infer the type of a function from its definition. For example, given function fun inc (x) = x + 1, the compiler can infer type inc : int -> int. The reasoning it uses, intuitively, is: since 1 is an integer, and + takes two integers, x must be an integer, and thus x+1 must be an integer as well.
PITFALL: As you will quickly notice, it can be extremely hard to debug a program that does not type check. It is very difficult to figure out why the compiler inferred this type for that expression. Solution: use type annotations to catch type errors!
PITFALL: Type inference and overloading don't mix well. The compiler, trying to infer a type for fun add (x,y) = x + y, will return type (int * int) -> int. But you may wish for it to be of type (real * real) -> real. Solution: use type annotations on functions!
ÜBER-TIP: It is very good practice always to annotate functions with types. We will require you to do this in programming assignments.