BLAS and basic operations
2026-01-21
Focus on:
Add concern with:
… all of which are basis-dependent!
Build on Basic Linear Algebra Subroutines (BLAS):
Arithmetic costs are \(O(n^1)\), \(O(n^2)\), and \(O(n^3)\), respectively.
Worry about two costs:
Data movement time varies a lot, depending on
Memory (linearly addressed) contains 1D floating point array:
Interpretation:
\[ x = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} \mbox{ or } A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \mbox{ or ...} \]
\[
A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}
\] (\(a_{ij}\) at data[i+j*m])
Fortran, MATLAB, Julia, R, NumPy / SciPy, Eigen and Arbmadillo (C++)
\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]
(\(a_{ij}\) at data[j+i*n])
C and C++ (built in), Pascal
Why do you need to know this?
Can be row-oriented (ij) or column-oriented (ji)
\[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{lightgray}{y_{2}} \\ \color{lightgray}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{red}{x_{2}} \\ \color{red}{x_{3}} \end{bmatrix} \] \[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{purple}{y_{2}} \\ \color{purple}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{lightgray}{x_{2}} \\ \color{lightgray}{x_{3}} \end{bmatrix} \]
matvec1_row takes about 118 msmatvec1_col takes about 14 msA*x takes about 3 msCache locality takes advantage of locality of reference
Numbers from one core on my laptop:
A takes about 120 MB
matvec1_col accesses A with unit stride
\[c_{ij} = \sum_k a_{ik} b_{kj}\]
ij(k) or ji(k): Each entry of \(C\) is a dot product of a row of \(A\) and a column of \(B\).
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
k(ij) or k(ji): \(C\) is a sum of outer products of column \(k\) of \(A\) and row \(k\) of \(B\).
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{purple}{c_{23}} \\ \color{purple}{c_{31}} & \color{purple}{c_{32}} & \color{purple}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{lightgray}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{lightgray}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
i(jk) or i(kj): Each row of \(C\) is a row of \(A\) multiplied by \(B\)
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{red}{b_{21}} & \color{red}{b_{22}} & \color{red}{b_{23}} \\ \color{red}{b_{31}} & \color{red}{b_{32}} & \color{red}{b_{33}} \end{bmatrix} \]
j(ik) or j(ki): Each column of \(C\) is \(A\) times a column of \(B\)
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{purple}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{purple}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{blue}{a_{21}} & \color{blue}{a_{22}} & \color{blue}{a_{23}} \\ \color{blue}{a_{31}} & \color{blue}{a_{32}} & \color{blue}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
\[ \begin{bmatrix} \color{purple}{C_{11}} & \color{blue}{C_{12}} \\ \color{red}{C_{21}} & \color{gray}{C_{22}} \end{bmatrix} = \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{blue}{c_{13}} & \color{blue}{c_{14}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{blue}{c_{23}} & \color{blue}{c_{24}} \\ \color{red}{c_{31}} & \color{red}{c_{32}} & \color{gray}{c_{33}} & \color{gray}{c_{34}} \\ \color{red}{c_{41}} & \color{red}{c_{42}} & \color{gray}{c_{43}} & \color{gray}{c_{44}} \end{bmatrix} \]
Q: Why is this a useful idea?
For \(\mathcal{A} \in L(\mathcal{V}_1 \oplus \mathcal{V}_2, \mathcal{W}_1 \oplus \mathcal{W}_2)\): \[ w = \mathcal{A} v \equiv \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} = \begin{bmatrix} \mathcal{A}_{11} & \mathcal{A}_{12} \\ \mathcal{A}_{21} & \mathcal{A}_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \] where \[ v_i = \Pi_{\mathcal{V}_i} v, \quad w_j = \Pi_{\mathcal{W}_j} w, \quad \mathcal{A}_{ij} = \Pi_{\mathcal{W}_i} \mathcal{A} |_{\mathcal{V}_j} \] Given bases \(V_1\), \(V_2\), and \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix}\) for \(\mathcal{V}_1\), \(\mathcal{V}_2\) and \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\) (and similarly for \(\mathcal{W}\)), get matrix representation with same block structure as above.
How LAPACK does it: