An intriguing feature of divide-and-conquer algorithms is that in principle they should run well on memory hierarchies with multiple levels of memory without the need for blocking or tiling. Each successive "divide" step in the divide-and-conquer process generates subproblems that touch successively smaller portions of data. For any level of the memory hierarchy, there is a level of division below which all the data touched by the subproblem will fit into that level of the memory hierarchy. Therefore, a divide-and-conquer algorithm can be viewed as an algorithm that is blocked for all levels of the memory hierarchy.
Let us look at a few divide-and-conquer algorithms to understand this. Matrix multiplication is usually expressed iteratively as follows:
for i = 1, N for j = 1, N for k = 1, N C(i,j) = C(i,j) + A(i,k)*B(k,j)It is possible to write the same code in a divide-and-conquer style by partitioning all the arrays into 4 blocks and expressing the computation to be performed as 8 smaller matrix multiplications:
|C11 C12| = |A11 A12|* |B11 B12| |C21 C22| |A21 A22| |B21 B22| so C11 = A11*B11 + A12*B21 C12 = A11*B12 + A12*B22 C21 = A21*B11 + A22*B21 C22 = A21*B12 + A22*B22Each of the 8 smaller matrix multiplications can themselves be performed in a divide-and-conquer style and so on till you get 1x1 matrices which can be multiplied directly since that corresponds to scalar multiplication. For obvious reasons, this algorithm is also called a block-recursive algorithm for matrix multiplication.
Similarly, triangular solve can be written in a block-recursive manner in terms of two smaller triangular solves and a matrix-vector product as follows:
|L11 0 | |x1|= |b1| |L21 L22| |x2| |b2| so L11*x1 = b1 L22*x2 = b2 - L21*x1Similar block-recursive codes can be written for other matrix computations such as Cholesky factorization and LU factorization with pivoting which are usually expressed iteratively.
Of course, divide-and-conquer algorithms are known for non-numeric problems as well. For example, sorting on a machine with a memory hierarchy can be performed efficiently using a divide-and-conquer algorithm like merge-sort.
In this project, you will extend Homework #3 solution to implement a block-recursive transformation in place of a tiling transformation. This involves replacing the tile_transform function with a new block_recursive_transform. Demonstrate the performance of your resulting code for programs like matrix-multiply and Cholesky.
Here are some issues that you will have to address in this project.