Prof David Bindel
Please click the play button below.
Two flavors: dense and sparse
Common structures, no complicated indexing
Stuff not stored in dense form!
15 ops (mostly) on vectors
25 ops (mostly) on matrix/vector pairs
9 ops (mostly) on matrix/matrix
Efficient cache utilization!
LU for 2×2: [abcd]=[10c/a1][ab0d−bc/a]
Block elimination [ABCD]=[I0CA−1I][AB0D−CA−1B]
Block LU [ABCD]=[L110L12L22][U11U120U22]=[L11U11L11U12L12U11L21U12+L22U22]
Think of A as k×k, k moderate:
[L11,U11] = small_lu(A); % Small block LU
U12 = L11\B; % Triangular solve
L12 = C/U11; % "
S = D-L21*U12; % Rank k update
[L22,U22] = lu(S); % Finish factoring
Three level-3 BLAS calls!
Parallel LA Software for Multicore Architectures
Matrix Algebra for GPU and Multicore Architectures
SLATE???
Much is housed at UTK ICL