All of the data structures we have covered in this course so far are functional. You can access old versions of these data structure, even when you update them. For example, you can cons an element onto a list without affecting references to the original list.
We have seen efficient functional implementations for stacks, queues, and binary trees. Can we come up with an efficient implementation of functional arrays, too? It turns out that we can, but we will need to use imperative language features. Our array will be implemented using an OCaml array for efficiency, but we need to be careful that updates to the array do not affect previous versions.
Let's define a signature:
module type FUN_ARRAY = sig
type 'a t
(* make n x creates an array of length n such that each element is x *)
val make : int -> 'a -> 'a t
(* get a i returns element i in the array a.
Requires: 0 <= i < length a *)
val get : 'a t -> int -> 'a
(* set a i x is the array a with x inserted at index i
Requires: 0 <= i < length a *)
val set : 'a t -> int -> 'a -> 'a t
end
Let's define the set function so that it makes a copy
of the array it was given, then updates the copy.
module FunArray : FUN_ARRAY = struct
type 'a t = 'a array
let make = Array.make
let get = Array.get
let set (a : 'a t) (i : int) (x : 'a) : 'a t =
let a' = Array.copy a in
a'.(i) <- x;
a'
end
This array is purely functional, but it's slow, because we make a new
array every time call set.
Say we execute FunArray.set a i x. Let's imperatively
set a.(i) <- x. Then, let's update all of the old
references to a so that they point to the old version of
a.
How are we going to update the old references to a?
One idea is to make 'a t a ref. Every time
FunArray.set is called, we change the value stored at
a so that it has the same mappings as the original
array. Thus the old references to a will still refer to
the old version of a, even though the underlying array
is different.
type 'a t = 'a data ref and 'a data = Arr of 'a array | Diff of int * 'a * 'a t
An array can either be an Arr(arr), which has the
same mappings as the OCaml array arr, or it can be a
Diff(i, x, a'), which has all of the same mappings as the
functional array a', exception with the mapping i -> x.
We then define the type 'a t so that it is a
ref to a value of type data.
Getting a value from such an array is simple.
let rec get (a : 'a t) (i : int) : 'a =
match !a with
Arr arr -> arr.(i)
| Diff(j, x, a') -> if i = j then x else get a' i
If the array points to an Arr(arr), then we just look up
the given index in arr. If it points to a Diff(j,
x, a'), then we see if j is the index we are
searching for. If it is, then we return x, the value it
maps to.
Setting a value is more complicated
let set (a : 'a t) (i : int) (x : 'a) : 'a t =
match !a with
Arr arr ->
let a' = ref (Arr arr) in
a := Diff(i, arr.(i), a');
arr.(i) <- x;
a'
| Diff _ -> ref (Diff(i, x, a))
If a points to a Diff, then we return a
reference to a new Diff which maps i to
x.
If a points to an Arr(arr), then we
imperatively set arr.(i) <- x. This operation invalidates
the old references to a. These old references to
a ought to refer to the old version of a
before the update. To get around this problem, we add a level of
indirection. We update the value stored at a so that it
contains a Diff preserving the old mapping of
i.
Diff nodes before
you reach the Arr. There will be one of these
Diff nodes per update.
We can improve performance when a user stops using the newest
version of the array and starts using an old version instead (for
example, in a backtracking algorithm). Every time a user sets or gets
an element of an array a, we reverse the list of
Diff nodes on the path to the Arr so that
a points directly to the Arr. This operation
is known as rerooting.
(* Effects: reverses the list of Diff nodes along the path to the Arr *)
let rec reroot (a : 'a t) : unit =
match !a with
Arr _ -> ()
| Diff(i, x, a') ->
reroot a';
match !a' with
Diff _ -> failwith "impossible"
| Arr(arr) ->
a := Arr(arr);
a' := Diff(i, arr.(i), a);
arr.(i) <- x
Every time the user calls get or set, we
reroot the tree. Note that this implementation is still inefficient
for some use cases. If a user frequently accesses many different
versions of an array, then rerooting will occur every time, and
performance will suffer. Another problem is that our definition of
reroot is not tail recursive (it is possible to fix this
problem).
For more details, please see this paper on which this recitation is based. They use functional arrays to implement a functional disjoint set, a data structure that we will cover later in the course.
Here is our final implementation.
module FunArray : FUN_ARRAY = struct
type 'a t = 'a data ref
and 'a data = Arr of 'a array | Diff of int * 'a * 'a t
let make (n : int) (x : 'a) : 'a t =
ref (Arr (Array.make n x))
(* Effects: reverses the list of Diff nodes along the path to the Arr *)
let rec reroot (a : 'a t) : unit =
match !a with
Arr _ -> ()
| Diff(i, x, a') ->
reroot a';
match !a' with
Diff _ -> failwith "impossible"
| Arr(arr) ->
a := Arr(arr);
a' := Diff(i, arr.(i), a);
arr.(i) <- x
let rec get (a : 'a t) (i : int) : 'a =
reroot a;
match !a with
Diff(j, x, a') -> failwith "impossible"
| Arr arr -> arr.(i)
let rec set (a : 'a t) (i : int) (x : 'a) : 'a t =
reroot a;
match !a with
Arr arr ->
let a' = ref (Arr arr) in
a := Diff(i, arr.(i), a');
arr.(i) <- x;
a'
| Diff _ -> ref (Diff(i, x, a))
end