Before we introduced mutable data the substitution model was a good way of reasoning about the values produced by SML programs. The substitution model is based on the notion of substituting equivalent expressions for one another, much like simplification of algebraic expressions. However, with mutable data, we now no longer can depend on the fact that a given expression always has the same value. For instance, f(3) might well return different values each time it is called - for instance by keeping local state and adding its argument to the internal state variable and returning the result. Thus we introduce the environment model in order to represent the evaluation of expressions that involve mutable data.
There are three key constructs in the environment model, all of which have to do with determining the values of variable names (identifiers):
In modeling the execution of an SML program, the evaluation of each expression is done with respect to some particular environment, which governs the bindings of the identifiers in that expression. The binding of an identifier in a given environment is determined by finding the first frame in the environment (sequence of frames) that specifies a binding for that identifier. That is, starting with a frame, if it has no binding for the name then the parent frame is checked, and so on up to TOP. As soon as a binding is found its value is the value of the identifier. If no frame contains a binding then the variable is unbound.
There is always one current or active environment, which is the environment corresponding to the expression that is currently being evaluated. There are generally many environments of which the current environment is just one. These environments form a tree structure, with the frame TOP as the root of the tree. TOP contains the bindings for all the built-in names that are accessible in the top-level read-eval-print loop (such as foldl, +, ....).
A new frame is created whenever a function is applied (called) and whenever a let expression is evaluated. We will first consider let expressions. We limit ourselves to let expressions that bind a single identifier. A let expression binding more than one identifier is expanded into a nested set of let expressions, one per identifier, as discussed in recitation.
To evaluate the expression let val x = e1 in e2
Consider the following simple example:
let val x = (4, ref 3) in #1 x end
We will denote the TOP level environment by a double box. Since this expression is being evaluated at top level, the new frame in step 2 has TOP as its parent. That frame binds x to the pair of 4 and a reference to 3. Then at the time that e2 is evaluated, the current environment is the one specified by this new frame:
Thus the value of #1 x is 4 in this environment. Once that expression is evaluated, the current environment returns to being the parent of that environment (TOP in this case).
Applying (or calling) a function also creates a new environment, again by adding a frame to an existing environment. However unlike a let expression, which extends the environment where let expression is evaluated (the current environment), function application extends the environment in which the function was defined (not the current environment where the call is happening). Thus we need to represent a function object in a manner that enables us to keep track of the environment where the function was defined. This is commonly referred to as a closure, which is composed of the function text (the parameters and the code) together with a pointer to the current environment at the time that the function was defined.
Consider the following simple example,
let val x = 3 val f = fn y: int => x in f end
As noted above, this is equivalent to two nested let expressions:
let val x = 3 in let val f = fn y: int => x in f end end
We know from above that each of these let expressions creates a frame that extends the current environment. The definition of the function f creates a closure that points to the current environment where the function was defined, namely the environment where the expression e1 of the inner let was evaluated (TOP extended by a single frame that binds x to 3). This is illustrated below:
Application of a function creates a new environment, not definition of a function. The rule for function application is:
To evaluate a function application e1(e2)
Let's turn to a simple function application example, which slightly extends the previous example of a let, to bind one additional variable, x. The code then applies the function f to x. Note that there are now two variables x, and we see the how the environment model lookup rule causes the inner declaration of x to shadow, or hide, the outer one.
let val x = 3 val f = fn y: int => x val x = 5 in f x end
The following diagram illustrates the environment at the point that the function body is evaluated (step 6):
Recursive functions are handled a bit differently. Note that the identifier f is not bound in the environment where the body of the function is evaluated (only y and the binding of x to 3 are part of that environment). Thus if we tried to make a recursive call to f it would result in an error looking up the identifier f. Recursive functions are defined using fun (or val rec). In this case, the closure points to the frame where the new identifier is being bound, rather than to the parent of that frame.
Consider the following definition of the recursive function fact:
let fun fact(n:int) = if n = 0 then 1 else n * fact(n-1) in fact 3 end
This creates a frame that binds the name fact to a closure, where the closure points to that same frame (rather than to the the parent frame where the function is evaluated as for an anonymous function considered above). Then the body of the let expression is evaluated. Each recursive call to fact creates a frame, all of which point to the frame where the identifier fact is bound (not to each other - a frame points to the environment where the function was defined). However, there is still a control flow, which is that intermediate values need to be passed back in the recursive computation. One way to keep track of that is using dotted arrows to note what environment is in effect once a function application returns a value.
Contrast the above use of true recursion with the following recursive function that uses refs to access the appropriate closure (function object):
let val x = ref (fn x: int => 1) val fact = fn (n:int) => if n = 0 then 1 else n * (!x)(n-1) val () = x := fact in fact 3 end
The third val
in the let
is actually used for the
effect that the := operator has, not for value. The two closures (function
objects) created here result from anonymous functions, that is they are
analogous to the initial case considered above where the closure points to the
parent frame not the frame where the name is defined. Try this example to
see that you understand how this computes the same result as the more standard
definition of factorial above.