CS312 Lecture 17: More About the Evaluator

Administrivia

From: Laurie Buck 
Sent: Mon 10/28/2002 12:41 PM 
Subject: important announcement to make

[...]	
COM S/ECE 314 will have two lectures in Spring 03: TR 10:10-11:25 and TR
11:40-12:55.   the second lecture has not been put into the system yet.

we advise you to not lock in your pre-enrollment until you are able to add
the lecture of your choice.  keeping your schedule unlocked does not affect
your placement in your other spring courses.  you will not be cut from the
other engineering courses you have entered on your schedule.

also, note that COM S 212 has been dropped as a prerequisite for COM S 314.

The Evaluator

The evaluator is now available on the web - download it and play with it. Make sure to check for updates, since we are going to extend and modify it as our discussion progresses. Please report bugs to Tibor.

The evaluator can print debug information (set variable debug to true in evaluator.sml). There are several commands one can use when running the evaluator (:e, :p, :q, :h). Use :h to remind yourself of them.

We will not explain in detail how the input character string is transformed into an AST. We will only worry about how to evaluate ASTs.

The correctness of a Mini-SML expression depends on the computation's context. For example, x + y is a correct expression if, say, x = 2 and y = 3. The same expression is incorrect, however, if x = 1 and y = "alpha".

The context of a computation is encapsulated in its associated environment. In the simplest case, an environment is a list of (variable or function name, value, type) triples. We will call such triples bindings. We will look up values using their name, and we'll start the search from the head of the list. Such an environment allows for the shadowing of names.

The rules that associate environments to computations fundamentally influence the semantics of the language. We will discuss these issues in another lecture.

The interpreter evaluates both declarations and expressions. Declarations evaluated at the top level result in bindings that are added to the top level environment. Thus these declarations are "sticky."

The evaluator recognizes three types of expressions:

Expressions that consist of numbers, characters, strings, unary and binary operators, built-in forms (the projection operator, if, let, val statements), and user-defined Mini-SML functions. The main loop of the interpreter exerts full control over the evalution process: all expressions are evaluated, the number and type of arguments is checked, and - in the case of user defined functions - the return value's type is also checked.
Predefined functions (e.g. hd, print). The interpreter collects and evaluates all of the function's arguments, before handing these over to the code that implements the function. No type checking is done on the arguments in the main evaluation function, rather it is the responsibility of the the function's implementation to detect and handle possible type errors. Not even the number of arguments is checked in evaluate. This is an example of a design decision where we chose to implement a mechanism that provides more flexibility than in the case of SML. In this setting, one can write predefined functions that take a variable number of arguments. Of course, we could have decided to check the number and type of the arguments more carefully in the evaluate function. Think of how you would implement a more stringent policy. Think of the advantages and disadvantages of your solution.
Special forms (e.g. if3, lazylet). These are handled similarly to predefined functions, except that arguments are not even evaluated before being passed to the code that implements the special form. Thus a special form has full control over its arguments: it can decide which, if any, arguments it will evaluate, and when.

Predefined functions and special forms are added to the top level environment, i.e. their definition is known even before the user enters the first declaration or expression.

Let us take a look at special form if3, which you touched upon before:

>> if3(true, 7, fail)
7
>> :p if3(true, 7, fail)
Exp_t(
.   Apply_e(
.   .   Id_e(if3)
.   .   Tuple_e(
.   .   .   Bool_c(true)
.   .   .   Int_c(7)
.   .   .   Id_e(fail)
.   .   )
.   )
)

Could if3 be a predefined function? No, because that would mean that all three arguments would be evaluated before calling the function, which is a completely different semantics.

>> let fun if3(test: bool, v1: int, v2: int): int = if test then v1 else v2 in if3(true, 7, fail) end
RUNTIME ERROR: argument types don't match in function call.

What about lazylet? Here is a possible implementation:

fun specialForm (name: string, expr: exp, en: env): value * typ =
(
  if debug then
    print("(\n special form: " ^ name ^ "\n"
        ^ " unevaluated argument = \n"
        ^ printExp(expr, 4)
        ^ " environment = \n"
        ^ printEnv(en, 4)
        ^ ")\n\n")
  else ();

  case name of
    ...
  | "lazylet" =>
       (case expr of
          Tuple_e([var, e1, e2]) =>
            (case evaluate(var, en) of
               (String_v(name), String_t) =>
                  evaluate(e2, insertBinding(name, evaluate(e1, en), en))
             | _ => err "first argument of 'lazylet' must be a name")
          | _ => err "incorrect argument number for 'lazylet;' should be 3")
    ...
)

Note the presence of the debugging code. You might want to include debugging functionality in the code that you will add to the evaluator as well.

SML's pattern matching mechanism is used to require that lazylet be given three arguments. The first argument is evaluated, and if it evaluates to a string (variable name), then this name and the value of the (evaluated) second argument is used to add a binding to the current environment. The third argument will be evaluated in the context of this extended environment.

>> lazylet("x", "long", "x could represent a " ^ x ^ ", " ^ x ^ " computation.")
"x could represent a long, long computation.": string

Note that we can use lazylet to factor out the computation of more than one value:

>> lazylet("x", 1, lazylet("y", 2, x + y))
3: int

We mentioned the possibility of implementing functions and special forms that take a variable number of arguments. Let us write function ncat that takes zero, one, or more string arguments and returns its concatenated arguments. Here is an implementation of ncat as a predefined function:

  fun predefined (name: string, (arg, argt): value * typ): value * typ =
    ...
    case (name, arg, argt) of
    ...
    | ("ncat",  Tuple_v(sl), Tuple_t(tl)  )  =>
        if List.all (fn t => case t of String_t => true | _ => false) tl
        then
          (String_v (foldl (fn (sv, cs) => case sv of
                                             String_v(s) => cs ^ s
                                           | _ => err "internal error [10]")
                           ""
                           sl),
           String_t)
    ...

>> ncat
predefined_function(ncat): undefined -> string
>> ncat()
"": string
>> ncat("a", "b", "c", "d", "e")
"abcde": string

Now, let's implement it as a special form:

  fun specialForm (name: string, expr: exp, en: env): value * typ =
    ...
    case name of
    ...
    | "ncat2" =>
        (String_v (foldl (fn (e, cs) =>
                            case evaluate(e, en) of
                              (String_v(s), String_t) => cs ^ s
                            | _ => err "'ncat2' takes only string arguments")
                   ""
                   (case expr of
                      Tuple_e(elst) => elst
                    | _             => [expr])),
         String_t)
    ...

>> ncat2;
special form(ncat2): undefined -> string
>> ncat2();
"": string
>> ncat2("a", "b", "c", "d", "e")
"abcde": string

Since ncat must evaluates all its arguments, from left to right, it is wasteful to implement it as a special form (and evaluate the arguments "by hand"). Instead, we should rely on the in-built mechanism for predefined functions.

The design decisions that we made allow us, in effect, to define functions that take a variable number of arguments, and the arguments can have any combination of types. We are free to do whatever we want, but such flexibility can be risky. In programming, too much flexibility is harmful, as it increases the chance of errors. People have invested a lot of time and energy in introducing meaningful restrictions into programming languages. Often, these restrictions attempt to make automated program error detection easier. Think, for example, of the visibility rules of class variables in Java, or transparent/opaque signature ascription in SML. A good way to introduce restrictions that enhance program correctness is to use types.

Types in Mini-SML

From our perspective, types are a mechanism that enforces a certain programming discipline, consequently decreasing the likelyhood of (certain categories of) programming errors. There is a vast theory dealing with types, but we will ignore most of it, and stick to this basic view.

There are many type systems that one can define, and they greatly differ in expressiveness. Mini-SML has a very simple type system. We will see below that the type system in Mini-SML is too weak to express types that are common in SML.

As an exercise, can you determine the type of function ncat above? Can you express its type in SML?

What does it mean for a program to type check? Simply put, it means that if the program terminates, then all operators and function calls will operate on the "right" data (e.g. there will be no attempts to add a string to an integer). A program that does type check is not guaranteed to finish, and it is not guaranteed to produce a correct result. Unfortunately, it turns out that we can't do better.

Note that it is possible for a program that never completes to type check:

fun strange(x: int): int = raise Fail "did yo expect this?"

Mini-SML has the following basic types: bool, int, real, char, string. We can compose types to define lists, tuples, and functions. Also, there is an undefined type that is used internally to refer to as-of-yet-unknown types. Later, when talking about side effects, we will also introduce reference types.

SML does static type checking: it examines the code before executing it. This has the advantage that once a program is type checked, execution can proceed at full speed, as all operations will be performed on the right type of data. One could argue that this approach is sometimes too conservative. Take a look at the code below:

-let 
  fun f(x: int): int = if x = 1 then 5 else "one"
in
  f(1) + 1
end
stdIn:154.24-154.50 Error: types of rules don't agree [literal]
  earlier rule(s): bool -> int
  this rule: bool -> string
  in rule:
    false => "one"

This piece of code will not be accepted by SML, even though for the given value of the function argument the else branch will not be evaluated. A static approach must be conservative as it is not possible in general to predict the execution path of a program. Thus SML will require that both branches of the if will return values of the same type.

In contrast, Mini-SML uses dynamic type checking, which means that type-checking is performed in parallel to the execution. Expressions that are never evaluated are not type-checked. Thus the code above will execute without any problem:

>> let fun f(x: int): int = if x = 1 then 5 else "one" in f(1) + 1 end
6: int

Dynamic type checking will never permit operations on data that has incorrect type. Because it is done in parallel with the execution, however, dynamic type checking can afford to be less conservative - rather than imposing that all execution paths type check, it will make sure that the specific execution path the program takes will typecheck. The disadvantage of dynamic type checking is that it will use a lot of computational resources. For example, if a function is called n times with the same arguments, dynamic type checking will be performed n times on that function. We gain flexibility, but we lose efficiency.

Both static and dynamic type checking are legitimate type checking methods, and both have followers.

Our type system is too weak to express types common in SML. For example, Mini-SML does not have polymorphism. This prevents us from correctly defining the type of predefined function hd: being unable to represent polimorphic type 'a list -> 'a, we declare hd to be of type undefined list -> undefined. The information that the returned value is of the same type as the base type of the list is lost.

The undefined type is only used internally, and is compatible with ("equal to") any other type. We'll talk more about this shortly. If we allowed for the use of the undefined type at the user level, the mini-SML evaluator would exhibit behaviors somewhat analogous to polymorphism. Here is an example:

(* Modified Mini-SML with undefined type accessible at user level. *)
>> let fun len(x: undefined list): int = if null(x) then 0 else 1 + len(tl x) in len([1, 2, 3]) + len(["four", "five"]) end
5: int

Here is another example, which shows the effect of dynamic type checking:

>> let fun f(x: int): undefined = if x = 0 then 0 else "zero." in if true then print("The value of your investment is " ^ f(1)) else f(0) end
The value of your investment is zero.(): unit

>> let fun f(x: int): undefined = if x = 0 then 0 else "zero." in if false then print("The value of your investment is " ^ f(1)) else f(0) end
0: int

In Mini-SML type declarations are mandatory for all function arguments, function return values, and variables declared in val statements. It is not possible to declare the type of an expression. This is useful in SML, for example, to give a type to the empty list ([]:int list). In Mini-SML [] has type undefined list.

CS312 Lecture 17: More About the Evaluator

Administrivia

The Evaluator

Types in Mini-SML

CS312 home © 2002 Cornell University Computer Science