CS312 Lecture 6: Substitution Model - Overview

We will investigate the SML programming language, and programming languages in general, more deeply. We have talked informally about what the various constructs of SML mean and how they are evaluated. We can do better and provide a formal, precise way of explaining the meaning of SML programs, so that there is never any doubt about what a program means. This is known as defining a semantics for the programming language. The word "semantics" means "meaning". We will define the meaning of SML programs.

The semantics we define will be an operational semantics: a description of how a program is evaluated. As a first step, we will look at the substitution model of evaluation, in which we interpret SML programs as mathematical expressions. Thus, the substitution model has essentially the same evaluation rules that you learned for ordinary arithmetic-probably when you were in grade school! While this model has its limitations, it's a good starting point.

We will start off by looking at a subset of SML, consisting of

and expressions built out of these. Later we will add let.

Substitution model for ML expressions

When does the program stop? In arithmetic, it's when we reach a number, because there are no further steps to take. In general, we have some set of expressions in the programming language that can't be evaluated any further; we call these expressions values. Values are things that you can type at the SML prompt and get the same thing right back. For example, in SML, the following are values:

1
true
"hello"
(true, "5", 1)
fn(x:int) => x
5::4::nil         (=[5,4])

The following expressions are not values, because an evaluation step can be performed on them:

1+2
true orelse false
(true, "5", 0+1)
(fn(x:int) => x) (3)

Rewrite rules:

ML Interpreter in ML

We can write a BNF grammar for values v, just as we did earlier for expressions:

c ::= integer_const
    | bool_const
    | string_const
    | real_const
    | char_const
        
v ::= c                     (* constants *)
    | (v1,...,vn)           (* tuples of values *)
    | (fn (id:t):t' => e)   (* anonymous functions *)
    | {id1=v1, ..., idn=vn} (* records of values *)
    | Id  | Id(v)           (* data constructors *)

Anything described by this grammar is a value and thus a legal result of an SML program. In other words, any tuple whose elements are values is a value itself; any records whose fields are bound to values is a value, any data constructor applied to a value is also a value, and any anonymous function is a value-even if its body is an arbitrary expression e. In other words, the body of a function is not evaluated at all until it is applied to an argument.

How do we know that a program will always reach a value? Actually, we don't. A program might go into an infinite loop. But no matter how long the program executes, as long as it hasn't reached a value there will always be a reduction to perform. For example, we'll never have to apply a reduction to #i(v1,...,vn) where i > n. The SML type checker ensures that this and other bad things will never happen. This is what it means to say that SML is type-safe.