Problem Set 2

Due April 13, 11:59:59 pm.

In this problem set, we will investigate the use of Hidden Markov Models (HMMs) to recover the most likely sequence of hidden states, given that the output and transition probabilities are known. More specifically, consider K coins with probabilities p₀, p₂, ..., p_K-1 of coming up heads when flipped. You know each of these probabilities and you also know that there is a probability q that the coin being flipped will change at any time step. The coin that starts the sequence can be any coin with uniform probability. If the coin being flipped changes, it does so to one of the other coins with uniform probability. The task is to recover the most likely sequence of hidden states (coins being flipped). For example, if there were two coins, with probabilities p₁ = 0.99 and p₂ = 0.01 and q = 0.1 and you observe the sequence heads, heads, tails, tails, then the most likely sequence of hidden states is 0, 0, 1, 1.

Part 0

Download the structure and partial HMM Code.

Part 1

Use the Viterbi algorithm to find the most likely sequence of states. Put this code in the structure HMM_DP (don't change the function signature). The parameters to your function are the list of heads probabilities, the list of observed flip outcomes (true for heads, false for tails), and the probability of not changing to a different coin at each step (1-q). You need not worry about cases where probabilties are outside the range [0,1] (but be careful about the case where they are exactly 0 or 1). For instance, to recover the states for the case given in the introduction, you would call recoverCoins([0.99,0.01],[true,true,false,false],0.9), which should return [0,0,1,1].

Part 2

To avoid overflow in part 1, you should have added negative logarithms instead of multiplying probabilities (if not you may want to go back). This suggests an alternative approach to solving the problem. We can think of every pair (position in flip sequence, current coin) as a node in a graph. There is an edge from a node (pos,x) to node (pos+1,y) whose weight is equal to -log(probability of coin y coming up as it did in position pos) - log(probability of going from coin x to coin y). Thinking about things this way, recovering the state sequence is the same as finding the shortest path from some node (0,x) to some other node (N,y) where N is the number of coin flips. With this in mind, use Dijkstra's shortest path algorithm to find the state sequence.

To help you in this part, we have provided code for some of the key operations. The following functions are already written:

    (*Takes the length of the flip sequence and number of coins as input*)
    empty:int*int->hmmheap 

    (* Removes and return the element on top of the heap 
     * (the next node to expand)*)
    removeFirst: hmmheap->int*int*real

    (* This does everything else.  Given a hmmheap, a position, a state, and a
     * distance, this function inserts the (position,state) node into the heap
     * with the correct distance if the node has never been inserted.  If it is
     * already in the heap, the function updates its distance to the minimum of
     * the old value and new value.  Elements which have already been inserted
     * and removed are ignored in calls to reduce. The function returns true 
     * if the call changed the priority queue, and false otherwise*)
    reduce:hmmheap*int*int*real->hmmheap

With the code provided, you should be able to implement Dijkstras by a sequence of removeFirst and reduce operations: at each step you removeFirst and then reduce all the removed node's neighbors. Since you are trying to find a shortest path from any node (0,x) to any node (N,y), you should start by inserting (by calling reduce) each node (0,x) with distance 0. As soon as you find some node (N,y), you should stop and recover the sequence of states along the shortest path. By using the return value of reduce, you can store (in a 2D array) the previous nodes along the shortest path, and use these for backtracking once you reach a node (N,y). It is important that you stop as soon as a removeFirst call gives you (N,y), so you don't do any extra work. It is important that you wait until removeFirst gives you (N,y) -- don't stop as soon as you insert (N,y).

Part 3

When you run the above two implementations (which should give the same answer in most cases -- they may be different in the event of ties) which one is faster? Does the speed depend on the parameters? Explain which one you think is faster and why, or explain when one is faster and when the other is faster. This does not need to be particularly formal. You will need to use fairly long sequences to get accurate timing.

Part 4

Consider the following poorly documented function foo:

fun foo(l,rr,f,g) =
  case (l,rr) of
       (h1::t1,h2::t2) =>
       if(f(h2,h1)) then foo(g(h2,h1)::t1,t2,f,g)
       else foo(t1,h1::h2::t2,f,g)
     | (h::t,[]) => foo(t,[h],f,g)
     | _ => List.rev(rr)

Assuming that the functions f and g take constant time, what is the worst case runtime of calling foo(ls,[],f,g) in terms of N, the length of ls. Give your answer in Big-O notation, along with an analysis proving you are correct.

Part 5

Consider the following code:

     let 
       val x = 5
       fun f(y) = 
       let 
         fun f(x) = x
       in 
         f(f) 
       end
     in
       f 5
     end

Using the environment model, draw the state of the environment when the number of frames is maximal. That is, in the environment model we typically create frames for some things, and get rid of them for something else. Show the state of the environment at the point right before any frames are removed. Indicate where the final return of f 5 comes from in your diagram. This part may be written by hand and turned in in class.