Environment Model Diagrams

This handout alone will not teach you how to draw environment model diagrams. However, this should be a helpful reference after you've seen the examples we do in lecture and section.


Terminology

identifiers, expressions, and values
Just like in the substitution model, an "identifier" is a string representing a variable, an "expression" is a valid SML program snippet, and a "value" is the result of evaluating an expression.

We use notation like
  id, id1, id2, id3, ...
  e, e1, e2, e3, ...
  v, v1, v2, v3, ...
to represent identifiers, expressions, and values.


frames, environments, and extending
A "frame" is a set of variables and their values (that is, it is like a record; each value has a name). The term "activation record" is also often used. A frame is drawn as a box with bindings like max:42 and x:false inside. Every frame has a "parent pointer" which is drawn as an arrow from the frame's top pointing to some other environment. An "environment" is linked list of frames ending with the global environment. (We sometimes blur the distinction between frames and environments, because a frame is always the start of an environment.) When we say something like "extend an environment" we mean that we are creating a new frame which has that environment as its parent.


global environment or null environment
An environment is a linked list; every one terminates with the "null environment" (equivalent to nil for SML linked lists). It is also called the "global environment" because it implicitly contains bindings for everything visible globally (for example, all of the builtin and library functions).

The global environment is drawn either as the word "nil" or as two boxes of almost the same size (one inside the other).


boxed and unboxed
Values of a type are "boxed" if they are stored as a pointer to data stored elsewhere. For example, tuples (and therefore lists, if you recall the datatype definition) and records, refs and arrays, and functions are boxed values. Values of other types are "unboxed." For example, these include booleans, integers, and reals.

When drawing frames (see above), unboxed values go directly in the frame (x: true). Boxed values are drawn as an arrow starting in the frame and pointing to wherever (in the diagram) the value is actually stored.

Hint: when we talk about a "value" below, we really mean either an unboxed value or an arrow pointing to a boxed value. (That is, we often will draw a new arrow pointing to an existing boxed value. We rarely duplicate the entire boxed value. The evaluation rules below say explicitly when a new boxed value should be drawn, and we never do so otherwise.)


closures and environment pointers
A "closure" is essentially a function which hasn't been applied to its arguments yet. It has three parts: arguments, body, and "environment pointer." The first two are what you would expect when looking at a function, and the enviroment pointer is an arrow pointing to some frame. (We will discuss this more below; for now you don't need to know where the environment pointer comes from.)

There are a few ways to draw closures; any variation is fine as long as you are clear about the three parts listed above. One way is as a long horizontal box with a vertical bar splitting it in two pieces; some code like fn x => e goes in the left piece, and the arrow representing the environment pointer starts in the right piece. Another way is as two circles (like the number "8"), with the arguments listed beside the upper circle, the body beside the lower circle, and the environment pointer starting inside the lower circle.

Starting a new diagram

Every environment model diagram starts out with the gobal environment.

In addition, there is a "current environment" (which initially points to the global environment) and an "environment stack" which we use to keep track of environments, which is initially empty. (Often we use terminology like "save the current environment" to mean "push" and "return to the previous environment" to mean "pop".) The current environment is usually indicated using your finger, and the environment stack is usually not shown, but feel free to actually use paper if you have trouble keeping track of the environments.


Evaluating an expression in an environment

This is the core of the environment model. Initially we are evaluating some (large) expression in the global environment, but recursively we will end up evaluating some of the subparts in different environments.

Simple expressions

Simple expressions (those with no rule below) are evaluated using substitution model rules (which generally just means returning the result). Uses of builtin and library functions generally also fall into this category.

Tuple and record expressions

To evaluate a tuple expression (e1, e2, ..., en), first evaluate e1 in the current environment to get a value (call it v1), then evaluate e2 to get v2, etc. Then draw a horizontal box divided into n pieces; put each v in the appropriate spot (if v is an unboxed value, draw it inside the box; if it is a boxed value draw an arrow starting in the box). Finally, return an arrow pointing to the tuple. (Hint: evaluating an expression like the above is the only thing that causes a new tuple box to be drawn.)

Record expressions are evaluated just like tuple expressions, except we have to label the fields. (We hardly ever use records in environment model examples, so you don't need to memorize the exact way we draw them. Just know that they're like tuples with labels on the fields.)

List expressions

Recall that SML's builtin lists are just tuples:

  datatype 'a list = nil | :: of ('a * 'a list)
Therefore we draw lists as 2-tuples (called "cons cells") with the first item being some value (using the rules for boxed/unboxed values from the tuple section above) and the second item being either nil (sometimes drawn as an X in the box) or an arrow pointing to the rest of the list. (Hint: new cons cells only appear when using the :: operator, the [e1, e2, ...] special syntax, or library functions like map.)

Variables

To find the value of a variable, look it up in the current environment. That means look for a binding of that identifier to a value in the current frame. If none is found, look in its parent frame, and then its grandparent, etc. If you make it all the way to the global environment and there's still no binding, then an error should occur. (Of course, the SML typechecker prevents this sort of error from occurring at runtime, but when drawing environment model diagrams we often will just check for type errors as we go.)

Let expressions with one declaration (val declarations only)

For the time being, we will only consider val declarations in our let expressions. To evaluate an expression like:

  let
    val id1 = e1
  in
    e
  end
  1. Evaluate e1 in the current environment to get a value, call it v1.
  2. Extend the current environment with a new frame binding id1 to v1. (Hint: do not draw the frame until after you have done the evaluation in step one. This will prevent many many errors.)
  3. Evaluate e in the new environment to get a value, call it v.
  4. Restore the original environment.
  5. Return v.

Let expressions with multiple declaration

To evaluate a let expression with multiple declarations, intuitively, consider sequential "nested lets" (where the outermost let has the first declarations, and all of the others occur within the body of the outermost let). This can be seen by example:

  let
    val id1 = e1
    val id2 = e2
    val id3 = e3
  in
    e
  end
Treat it just like:
  let
    val id1 = e1
  in
    let
      val id2 = e2
    in
      let
        val id3 = e3
      in
        e
      end
    end
  end
In other words, to evaluate an expression like:
  let
    val id1 = e1
    ...
    val idn = en
  in
    e
  end
  1. Start with the first declaration.
  2. Evaluate the right-hand side of the current declaration in the current environment to get a value.
  3. Extend the current environment with a new frame binding the identifier of the current declaration to the value from step 2.
  4. If there are more declarations, skip to step 2 with the next declaration and the new environment.
  5. Evaluate e in the environment created by the last declaration to get a value, call it v.
  6. Restore the original environment.
  7. Return v.

Anonymous function expressions

To evaluate an expression like:

  fn (id1, id2, ...) => e
Simply create a closure with the specified parameters and body. The environment pointer of the closure points to the current environment. (Hint: be careful when thinking about what the "current environment" is. For example, see the hint for let expressions above, and think about how this affects anonymous functions in declarations.)

Function applications

To evaluate an expression like:

  e1(e2)
  1. Evaluate e1 to get a value, call it v1.
  2. Evaluate e2 to get a value, call it v2.
  3. If the code typechecks, then v1 must be an arrow pointing to a closure. Call the argument in that closure id and call the body in that closure e.
  4. Follow the environment pointer in that closure, and extend with a new frame binding id to v2.
  5. Evaluate e in the new environment (Hint: this environment having the correct parent pointer is absolutely critical!) to get a value, call it v.
  6. Restore the original environment.
  7. Return v.

Let expressions with recursive functions (fun declarations)

Now we return to let expressions to add recursive functions. To evaluate an expression like:

  let
    fun id(id1, id2, ...) = e1
  in
    e
  end
  1. Extend the current environment with a new, empty frame.
  2. Evaluate fn (id1, id2, ...) => e1 in this new frame to get a closure. The closure, according to the rules in the closure section above, has parameters id1, id2, etc., body e1, and environment pointer pointing to this new frame. (Hint: that environment pointer is the key to recursive functions.)
  3. Add a binding to the empty frame, with id bound to an arrow pointing to the closure.
  4. Evaluate e in the new environment to get a value, call it v.
  5. Restore the original environment.
  6. Return v.

Let expressions with multiple fun and val declarations are handled in the way we described earlier for multiple val declarations; simply do them sequentially, getting one new frame for each declaration, using the details from the val or fun section as appropriate.

We don't often use mutually recursive functions in environment model examples, but when evaluating something like:

  let
    fun f(x) = e1
    and g(y) = e2
  in
    e
  end
the idea is to make a new empty frame, create two closures (each with environment pointer pointing to the new frame) and then add bindings for both f and g to that new frame. Thus both e1 and e2 have both f and g in scope.

Ref expressions

To evaluate an expression like:

  ref e
  1. Evaluate e to get a value, call it v.
  2. Draw a new ref cell containing v. If v is unboxed, it goes directly in the ref cell; otherwise the ref cell contains an arrow pointing to the boxed value. (Hint: new ref cells are drawn only when the ref function is applied to an expression.)
  3. Return an arrow pointing to the new ref cell.

To evaluate an expression like:

  !e
  1. Evaluate e to get a value, call it v.
  2. If the code typechecks, v must be an arrow pointing to a ref cell. Return the value that ref cell contains.

To evaluate an expression like:

  e1 := e2
  1. Evaluate e1 to get a value, call it v1.
  2. Evaluate e2 to get a value, call it v2.
  3. If the code typechecks, v1 must be an arrow pointing to a ref cell. Replace the contents of that ref cell with v2. (Remember, if v2 is unboxed, it goes directly in the ref cell; otherwise the ref cell contains an arrow pointing to the boxed value.)

You should be able to extrapolate how arrays work from this (remember that refs are essentially one-element arrays), although it is unlikely we will ever do an array environment model example.


Pattern matching and multiple arguments

Recall that in SML, functions only take one argument (which might be a tuple). You may notice that when we do environment model diagrams we sometimes ignore this and act as if functions have multiple arguments. Similarly, we didn't introduce a formalism for when pattern matching is done in a let expression. In both cases, we simply make a frame with several bindings in it.

We also won't use case expressions often in the environment model, but should one come up, the idea is to first evaluate the expression at the top in the current environment, then find the case that matches and extend the current environment with a frame containing the bindings for whatever variables the pattern binds.


By Jeffrey M. Vinocur, March 2002