Lecture 20: Side Effects, Arrays, Memory

Administrivia

Evaluator for Prelim is now on the web. Todays lecture (and all lectures afterwards) will not be on the 2nd prelim. But they have a good chance of showing up on the final…

Total aside: here is some really bad C code:

#define o define
#o ___o write
#o ooo (unsigned)
#o o_o_ 1
#o _o_ char
#o _oo goto
#o _oo_ read
#o o_o for
#o o_ main
#o o__ if
#o oo_ 0
#o _o(_,__,___)(void)___o(_,__,ooo(___))
#o __o (o_o_<<((o_o_<<(o_o_<<o_o_))+(o_o_<<o_o_)))+(o_o_<<(o_o_<<(o_o_<<o_o_)))
o_(){_o_ _=oo_,__,___,____[__o];_oo ______;_____:___=__o-o_o_; _______:
_o(o_o_,____,__=(_-o_o_<___?_-o_o_:___));o_o(;__;_o(o_o_,"\b",o_o_),__--);
_o(o_o_," ",o_o_);o__(--___)_oo _______;_o(o_o_,"\n",o_o_);______:o__(_=_oo_(
oo_,____,__o))_oo _____;}
 

Why we need circular datastructures

Notation we will use for environment diagrams

To simplify matters we will get rid of the notion of boxed variables and always draw bindings as pointing to their values. (Among other things, this gives us a consistent semantics for :=).

We will draw ref cells (where the pointer can be changed) with a double box, kind of like TOP. We will draw tuples like [a|b|c] . For other data structures we will subscript the box with their type, using :: for cons cells (this will make more sense later on in lecture).

 

let val f = fn(x) => f(x+1) in E end
let fun f(x) = f(x+1) in E end
let fun f(x) = g(x+1) and g(y) = f(y-1) in E end
 
(* let val f = fn(x) => f(x+1) *)
 
let
  val cl = makeclosure(x,f(x+1),TOP)
  val b = makebinding("f", cl)
  val env = addbinding(b,TOP)
in
  env
end
 
 
(* let fun f(x) = f(x+1) *)
 
let
  val b = makebinding("f", JUNK)
  val env = addbinding(b,TOP)
  val cl = makeclosure(x,f(x+1),env)
in
  bash(binding, cl);
  env
end
 
 
(* let fun f(x) = g(x+1) and g(y) = f(y-1) *)
 
let
  val b1 = makebinding("f", JUNK)
  val b2 = makebinding("g", JUNK)
  val env = addbinding(b1,addbinding(b2,TOP))
  val c1 = makeclosure(x,g(x+1),env)
  val c2 = makeclosure(y,f(y-1),env)
in
  bash(b1, cl);
  bash(b2, c2);
  env
end

How computers actually work

Memory and pointers and indirection

Byte versus word addressing (Pentium is 32 bit, byte addressed, low-endian (Swift)); alignment

The memory hierarchy

In general we need some way to flag a value as holding a pointer or a direct value. SML makes values 32-bit aligned, so a pointer to them has low order 2 bits as 0. To encode VAL as a direct value we use 2*VAL+1 (so for example false = 1, true = 3) (if RDZ did it, true would be, of course, 42…)

Note that we will assume that given a pointer it is easy to determine the type of the object it points to. How can we do this? The easy way is to use the high-order bits (why not the low order?)

In SML the types are pretty much known at compile time, which is why programs that compile tend to run (this is a very useful property). So compare for example the int 0 and the bool false. In reality, most variables would also store somehow their associate types. But we will ignore this issue for simplicity and just assume that the types of all variables are known and represented “somewhere”. We will also ignore parameterized types; they can be handled, but they introduce additional complexity.

Strings are usually 1 byte and null-terminated.

How lists, refs and tuples might be implemented

The basic data structure is [Type tag,length,elt1,elt2,…] In the actual implementation, tag and length are compressed into a single 32-bit word.

The list [6,9,42] will be [CONS,6,ptr] [CONS,9,ptr] [CONS,42,ptr]. Note that this allows you to easily create new lists that share structure with old lists (for example, consider what x::l does…)

A ref is just a memory location holding a pointer

A tuple is a sequence of memory locations (there is some freedom about how to represent a nested tuple).

How SML environments might be implemented

An environment might be an array of known size, where each element is a string pointer (to the name) and a value (which may in turn be a pointer). An actual compiler will get rid of the names and replace them with offsets.

Next topic: memory management