# Functional Arrays

All of the data structures we have covered in this course so far are functional. You can access old versions of these data structure, even when you update them. For example, you can cons an element onto a list without affecting references to the original list.

We have seen efficient functional implementations for stacks, queues, and binary trees. Can we come up with an efficient implementation of functional arrays, too? It turns out that we can, but we will need to use imperative language features. Our array will be implemented using an OCaml array for efficiency, but we need to be careful that updates to the array do not affect previous versions.

## First attempt

Let's define a signature:

```module type FUN_ARRAY = sig
type 'a t

(* make n x creates an array of length n such that each element is x *)
val make : int -> 'a -> 'a t
(* get a i returns element i in the array a.
Requires: 0 <= i < length a *)
val get : 'a t -> int -> 'a

(* set a i x is the array a with x inserted at index i
Requires: 0 <= i < length a *)
val set : 'a t -> int -> 'a -> 'a t
end
```

Let's define the `set` function so that it makes a copy of the array it was given, then updates the copy.

```module FunArray : FUN_ARRAY = struct
type 'a t = 'a array
let make = Array.make
let get = Array.get
let set (a : 'a t) (i : int) (x : 'a) : 'a t =
let a' = Array.copy a in
a'.(i) <- x;
a'
end
```
This array is purely functional, but it's slow, because we make a new array every time call `set`.

## Second attempt

Say we execute `FunArray.set a i x`. Let's imperatively set `a.(i) <- x`. Then, let's update all of the old references to `a` so that they point to the old version of `a`.

How are we going to update the old references to `a`? One idea is to make `'a t` a `ref`. Every time `FunArray.set` is called, we change the value stored at `a` so that it has the same mappings as the original array. Thus the old references to `a` will still refer to the old version of `a`, even though the underlying array is different.

```type 'a t = 'a data ref
and 'a data = Arr of 'a array | Diff of int * 'a * 'a t
```

An array can either be an `Arr(arr)`, which has the same mappings as the OCaml array `arr`, or it can be a `Diff(i, x, a')`, which has all of the same mappings as the functional array a', exception with the mapping `i -> x`. We then define the type `'a t` so that it is a `ref` to a value of type `data`.

Getting a value from such an array is simple.

```let rec get (a : 'a t) (i : int) : 'a =
match !a with
Arr arr -> arr.(i)
| Diff(j, x, a') -> if i = j then x else get a' i
```
If the array points to an `Arr(arr)`, then we just look up the given index in `arr`. If it points to a ```Diff(j, x, a')```, then we see if `j` is the index we are searching for. If it is, then we return `x`, the value it maps to.

Setting a value is more complicated

```let set (a : 'a t) (i : int) (x : 'a) : 'a t =
match !a with
Arr arr ->
let a' = ref (Arr arr) in
a := Diff(i, arr.(i), a');
arr.(i) <- x;
a'
| Diff _ -> ref (Diff(i, x, a))
```

If `a` points to a `Diff`, then we return a reference to a new `Diff` which maps `i` to `x`.

If `a` points to an `Arr(arr)`, then we imperatively set `arr.(i) <- x`. This operation invalidates the old references to `a`. These old references to `a` ought to refer to the old version of `a` before the update. To get around this problem, we add a level of indirection. We update the value stored at `a` so that it contains a `Diff` preserving the old mapping of `i`.

## Rerooting

Our implementation is efficient for accessing the most recent version of an array. However, accessing old versions of the array are slow, because you have to traverse a list of `Diff` nodes before you reach the `Arr`. There will be one of these `Diff` nodes per update.

We can improve performance when a user stops using the newest version of the array and starts using an old version instead (for example, in a backtracking algorithm). Every time a user sets or gets an element of an array `a`, we reverse the list of `Diff` nodes on the path to the `Arr` so that `a` points directly to the `Arr`. This operation is known as rerooting.

```  (* Effects: reverses the list of Diff nodes along the path to the Arr *)
let rec reroot (a : 'a t) : unit =
match !a with
Arr _ -> ()
| Diff(i, x, a') ->
reroot a';
match !a' with
Diff _ -> failwith "impossible"
| Arr(arr) ->
a := Arr(arr);
a' := Diff(i, arr.(i), a);
arr.(i) <- x
```

Every time the user calls `get` or `set`, we reroot the tree. Note that this implementation is still inefficient for some use cases. If a user frequently accesses many different versions of an array, then rerooting will occur every time, and performance will suffer. Another problem is that our definition of `reroot` is not tail recursive (it is possible to fix this problem).

For more details, please see this paper on which this recitation is based. They use functional arrays to implement a functional disjoint set, a data structure that we will cover later in the course.

Here is our final implementation.

```module FunArray : FUN_ARRAY = struct
type 'a t = 'a data ref
and 'a data = Arr of 'a array | Diff of int * 'a * 'a t

let make (n : int) (x : 'a) : 'a t =
ref (Arr (Array.make n x))

(* Effects: reverses the list of Diff nodes along the path to the Arr *)
let rec reroot (a : 'a t) : unit =
match !a with
Arr _ -> ()
| Diff(i, x, a') ->
reroot a';
match !a' with
Diff _ -> failwith "impossible"
| Arr(arr) ->
a := Arr(arr);
a' := Diff(i, arr.(i), a);
arr.(i) <- x

let rec get (a : 'a t) (i : int) : 'a =
reroot a;
match !a with
Diff(j, x, a') -> failwith "impossible"
| Arr arr -> arr.(i)

let rec set (a : 'a t) (i : int) (x : 'a) : 'a t =
reroot a;
match !a with
Arr arr ->
let a' = ref (Arr arr) in
a := Diff(i, arr.(i), a');
arr.(i) <- x;
a'
| Diff _ -> ref (Diff(i, x, a))
end
```