Recitation 16:
Substitution model and refs

The substitution model that we've seen so far explains how computation works as long as no imperative features of ML are used. This model describes computation as a sequence of rewrite steps in which a program subexpression is replaced by another until no further rewrites are possible. However, imperative features introduce the possibility of state : an executing ML program is accompanied by a current memory state that also changes as computation proceeds.

We don't want to get into the details of how memory heaps work yet, so we will use a simple abstract model of state. A memory M is a collection of memory cells each with its own unique name. We will call these names locations; a location is an abstract version of a memory address at the hardware level. Given a location, we can look up in the memory what value is stored at that location. As the program executes, the contents of some memory locations may change.

One way to visualize the execution is the memory consists of a large (actually, infinite) number of boxes, each of which can contain a single value. At any given point during execution, some boxes are in use and others are empty. Each box has a unique name (its location) and this location can be used to find the single box with that name. Given a memory, we can always find a box  that is unused.

Ref operations

There are three principal operations on references: creation using the ref operator, deferencing using !, and update using :=. Each of these operations has an associated reduction that is used when evaluating it. In order to explain what these operations do, a new kind of expression is needed, representing a location. We will write the syntactic metavariable loc to represent a location. For the purposes of explaining how to evaluate SML, we assume that there is an infinitely large set of locations (called Loc) available for use when evaluating programs, even though the actual memory is infinite. We don't care what the elements of Loc actually are. We can think of them as memory addresses, as integers, or even as strings. All that matters is that we can tell two different elements of Loc apart.

ref

The ref operation creates a new location. It is reduced once its argument is a value, creating a new location.

ref v  -->  loc

The new location loc is one that is unused in the current memory. This evaluation step also has a side effect: the memory cell named  loc is made to contain the value v.

This rule introduces a loc expression into the running program. This is a bit different from all the evaluation rules that we have seen till this point, because a loc expression cannot occur in the original SML program. This isn't a problem; we have to remember that our models of evaluation are useful fictions. As long as the model gives the right answer for what happens when the program runs, we are satisfied. In SML, if a program evaluates to a location, it is printed as a ref expression. However, note that two locations are not equal even if they have the same contents, because they are different locations:

- ref (2+2);
it = ref 4 : int ref
- it = ref 4;
val it = false : bool

!

The dereference (!) operation finds the value stored at a given location:

! loc  -->  v

Of course, the value v that replaces the subexpression !loc is the value found in the memory cell named loc.

:=

The update (:=) operation updates the value stored at a given location:

! loc := v   -->  ()

It evaluates to the unit value, but has the side effect of updating the memory location named  loc to contain v.

Example

Consider the following SML example:

let val x = ref 0
    val y = x
in
    x := 1;
    !y
end
 What does this evaluate to? We can use the model about to figure it out:
let val x = ref 0
    val y = x
in x := 1; !y end
Memory: (empty)
-->
let val x = loc1      (loc1 is some new location)
    val y = x
in x := 1; !y end
Memory: (loc1 = 0)
-->   (substitute loc1 for x)
let val y = loc1      
in loc1 := 1; !y end
Memory: (loc1 = 0)
-->   (substitute loc1 for y)
loc1 := 1; !loc1
Memory: (loc1 = 0)
--> !loc1
Memory: (loc1 = 1)
--> 1
Memory: (loc1 = 1)

So we see that the update to x is visible when we dereference y. This happens because x and y are aliases : two different names for the same location (loc1).