CS312 Lecture 6: Substitution Model - Overview

We will investigate the SML programming language, and programming languages in general, more deeply. We have talked informally about what the various constructs of SML mean and how they are evaluated. We can do better and provide a formal, precise way of explaining the meaning of SML programs, so that there is never any doubt about what a program means. This is known as defining a semantics for the programming language. The word "semantics" means "meaning". We will define the meaning of SML programs.

The semantics we define will be an operational semantics: a description of how a program is evaluated. As a first step, we will look at the substitution model of evaluation, in which we interpret SML programs as mathematical expressions. Thus, the substitution model has essentially the same evaluation rules that you learned for ordinary arithmetic-probably when you were in grade school! While this model has its limitations, it's a good starting point.

We will start off by looking at a subset of SML, consisting of

constants
arithmetic expressions +,*,=
if
fn

and expressions built out of these. Later we will add let.

Substitution model for ML expressions

Basic idea: apply a series of rewrite rules to eventually reduce an expression to a value
- Arithmetic examples are easy (like grade school)
Expressions versus Values

When does the program stop? In arithmetic, it's when we reach a number, because there are no further steps to take. In general, we have some set of expressions in the programming language that can't be evaluated any further; we call these expressions values. Values are things that you can type at the SML prompt and get the same thing right back. For example, in SML, the following are values:

1
true
"hello"
(true, "5", 1)
fn(x:int) => x
5::4::nil         (=[5,4])

The following expressions are not values, because an evaluation step can be performed on them:

1+2
true orelse false
(true, "5", 0+1)
(fn(x:int) => x) (3)

Rewrite rules:

Self-evaluating expressions
Primitive applications (*,+,=)
- Evaluation rules
Conditionals (if)
Procedures (fn)
combinations (substitution)
- Not the eval rule requires that all args be evaluated (order not specified)
- Prelim #1 style example: fun if-not (test: bool, thenval: int, elseval:int) = if (not test) then thenval else elseval
Let
- Variable occurrences in an expression are free (unbound), binding or bound (i.e., used as if defined, defining, used after defined). Example: x + (let val x = 1 in x+3 end).
- Each binding occurrence has a scope.
- Note that we actually have to be careful when we substitute!
  - When we substitute for some variable x, we don't replace the binding or bound occurrences of x, because that variable is really a different variable despite having the same name.
  - This can be implemented via alpha-renaming.
Top-level big let
- Substitution at point of definition.
- Explains why you can't change a variable value...
  - Just create a new one with the same name.
Evaluation order
- We can't perform reductions just anywhere. Each SML expression imposes some order on the evaluation of its subexpressions. For example, no reductions can be performed on the body of a let expression until all of its declarations have been evaluated and the results substituted into the body. Similarly, no evaluations are performed inside an fn, though substitutions are. Example: fun lose(y: int):real = 1.0 / 0.0.

ML Interpreter in ML

We can write a BNF grammar for values v, just as we did earlier for expressions:

c ::= integer_const
    | bool_const
    | string_const
    | real_const
    | char_const
        
v ::= c                     (* constants *)
    | (v1,...,vn)           (* tuples of values *)
    | (fn (id:t):t' => e)   (* anonymous functions *)
    | {id1=v1, ..., idn=vn} (* records of values *)
    | Id  | Id(v)           (* data constructors *)

Anything described by this grammar is a value and thus a legal result of an SML program. In other words, any tuple whose elements are values is a value itself; any records whose fields are bound to values is a value, any data constructor applied to a value is also a value, and any anonymous function is a value-even if its body is an arbitrary expression e. In other words, the body of a function is not evaluated at all until it is applied to an argument.

How do we know that a program will always reach a value? Actually, we don't. A program might go into an infinite loop. But no matter how long the program executes, as long as it hasn't reached a value there will always be a reduction to perform. For example, we'll never have to apply a reduction to #i(v1,...,vn) where i > n. The SML type checker ensures that this and other bad things will never happen. This is what it means to say that SML is type-safe.