Consider the following C code:
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
f (i, j);
How might we write an equivalent function in SML? Perhaps as follows:
fun applyf (f, n) = let
fun lp_i i = if i < n
then let
fun lp_j j = if j < n
then (f (i, j); lp_j (j + 1))
else ()
in (lp_j 0; lp_i (i + 1)) end
else ()
in lp_i 0 end
While the two loop functions (lp_i and lp_j) are tail-recursive, the call lp_j 0 from the outer loop to the inner loop is not tail-recursive. This non-tail call forces lp_i and lp_j to be mapped into different clusters, which occupy separate procedures with separate stack frames.
What happens when we CPS convert the function? We get the following:
fun applyf (f, n, kt) = let
fun lp_i (i, ki) = if i < n
then let
fun lp_j (j, kj) = if j < n
then let
fun kf' () = lp_j (j + 1, kj)
in f (i, j, kf') end
else kj ()
fun kj' () = lp_i (i + 1, ki)
in lp_j (0, kj') end
else ki ()
in lp_i (0, kt) end
A simple control-flow analysis shows that the return continuation of lp_j is always the known function kj', which enables compiling the nested loops into a single machine procedure.