{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Lecture 2: Linear algebra done efficiently\n", "\n", "## CS4787 — Principles of Large-Scale Machine Learning Systems\n", "\n", "$\\newcommand{\\R}{\\mathbb{R}}$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import numpy\n", "import scipy\n", "import matplotlib\n", "import time\n", "\n", "def hide_code_in_slideshow(): \n", " from IPython import display\n", " import binascii\n", " import os\n", " uid = binascii.hexlify(os.urandom(8)).decode() \n", " html = \"\"\"
\n", " \"\"\" % (uid, uid)\n", " display.display_html(html, raw=True)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "Recall our first principle from last lecture...\n", "\n", "**Principle #1: Write your learning task as an optimization problem, and solve it via fast algorithms\n", "that update the model iteratively with easy-to-compute steps using numerical linear algebra.**" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "Recall our first principle from last lecture...\n", "\n", "**Principle #1: Write your learning task as an optimization problem, and solve it via fast algorithms\n", "that update the model iteratively with easy-to-compute steps using numerical linear algebra.**" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "fragment" } }, "source": [ "A simple example: we can represent the properties of an object using a **feature vector** (or embedding) in $\\R^d$. Say we wanted to predict something about a group of people including this guy (Svante Myrick)\n", " \n", "using the fact that he is 33, graduated in 2009, started his current job in 2012, and makes $58,561 a year." ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "fragment" } }, "source": [ "One way to represent this is as a vector in$4$dimensional space.\n", "\n", "$$x = \\begin{bmatrix}33 \\\\ 2009 \\\\ 2012 \\\\ 58561\\end{bmatrix}.$$\n", "\n", "Representing the information as a vector makes it easier for us to express ML models with it. We can then represent other objects we want to make predictions about with their own vectors, e.g.\n", "\n", "$$x = \\begin{bmatrix}78 \\\\ 1965 \\\\ 2021 \\\\ 400000\\end{bmatrix}.$$" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "### Linear Algebra: A Review\n", "\n", "Before we start in on how to compute with vectors, matrices, et cetera, we should make sure we're all on the same page about what these objects are.\n", "\n", "A vector (represented on a computer) is an array of numbers (usually floating point numbers). We say that the **dimension** (or length) of the vector is the size of the array, i.e. the number of numbers it contains." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A vector (in mathematics) is an element of a **vector space**. Recall: a vector space over the real numbers is a set$V$together with two binary operations$+$(mapping$V \\times V$to$V$) and$\\cdot$(mapping$\\R \\times V$to$V$) satisfying the following axioms for any$x, y, z \\in V$and$a, b \\in \\R$\n", "\n", "*$x + y \\in V$and$a \\cdot x = ax \\in V$_(closure)_\n", "*$(x + y) + z = x + (y + z)$_(associativity of addition)_\n", "*$x + y = y + x$_(transitivity of addition)_\n", "* there exists a$(-x)$such that$x + (-x) = 0$_(negation)_\n", "*$0 \\in V$such that$0 + x = x + 0 = x$_(zero element)_\n", "*$a(bx) = b(ax) = (ab)x$_(associativity of scalar multiplication)_\n", "*$1v = v$_(multiplication by one)_\n", "*$a(x+y) = ax + ay$and$(a+b)x = ax + bx$_(distributivity)_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can treat our CS-style array of numbers as modeling a mathematical vector by letting$+$add the two vectors elementwise and$\\cdot$multiply each element of the vector by the same scalar." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Again from the maths perspective, we say that a set of vectors$x_1, x_2, \\ldots, x_d$is **linearly independent** when no vector can be written as a linear combination of the others. That is,\n", "\n", "$$\\alpha_1 x_1 + \\alpha_2 x_2 + \\cdots + \\alpha_d x_d = 0 \\;\\leftrightarrow \\alpha_1 = \\alpha_2 = \\cdots = \\alpha_d = 0.$$\n", "\n", "We say the **span** of some vectors$x_1, x_2, \\ldots, x_d$is the set of vectors that can be written as a linear combination of those vectors\n", "\n", "$$\\operatorname{span}(x_1,x_2,\\ldots,x_d) = \\{\\alpha_1 x_1 + \\alpha_2 x_2 + \\cdots + \\alpha_d x_d \\mid \\alpha_i \\in \\R \\}.$$\n", "\n", "Finally, a set of vectors is a **basis** for the vector space$V$if it is linearly independent and if its span is the whole space$V$.\n", "\n", "* Equivalently, a set of vectors is a basis if any vector$v \\in V$can be written uniquely as a linear combination of vectors in the basis.\n", "\n", "We say the **dimension** of the space is$d$if it has a basis of size$d$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What does this have to do with our computer-science definition of a vector?\n", "\n", "If any vector$v$in the space can be written uniquely as\n", "\n", "$$v = \\alpha_1 x_1 + \\alpha_2 x_2 + \\cdots + \\alpha_d x_d$$\n", "\n", "for some real numbers$\\alpha_1, \\alpha_2, \\ldots$, then to represent$v$on a computer, it suffices to store$\\alpha_1$,$\\alpha_2$,$\\ldots$, and$\\alpha_d$. We may as well store them in an array...and this gets us back to our CS-style notion of what a vector is.\n", "\n", "* Importantly, this only works for finite-dimensional vector spaces!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Typically, when we work with a$d$-dimensonal vector space, we call it$\\R^d$, and we use the **standard basis**, which I denote$e_1, \\ldots, e_d$. E.g. in 3 dimensions this is defined as\n", "\n", "$$e_1 = \\begin{bmatrix} 1 \\\\ 0 \\\\ 0 \\end{bmatrix}, \\;\n", "e_2 = \\begin{bmatrix} 0 \\\\ 1 \\\\ 0 \\end{bmatrix}, \\;\n", "e_3 = \\begin{bmatrix} 0 \\\\ 0 \\\\ 1 \\end{bmatrix},$$\n", "\n", "and more generally$e_i$has a$1$in the$i$th entry of the vector and$0$otherwise. In this case, if$x_i$denotes the$i$th entry of a vector$x \\in \\R^d$, then\n", "\n", "$$x = x_1 e_1 + x_2 e_2 + \\cdots + x_d e_d = \\sum_{i=1}^d x_i e_i.$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### In Python\n", "\n", "In Python, we use the library **numpy** to compute using vectors." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "u = [1. 2. 3.]\n", "v = [4. 5. 6.]\n", "u + v = [5. 7. 9.]\n", "2 * u = [2. 4. 6.]\n" ] } ], "source": [ "import numpy\n", "\n", "u = numpy.array([1.0,2.0,3.0])\n", "v = numpy.array([4.0,5.0,6.0])\n", "\n", "print('u = {}'.format(u))\n", "print('v = {}'.format(v))\n", "print('u + v = {}'.format(u + v))\n", "print('2 * u = {}'.format(2 * u))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We can see that the standard vector operations are both supported easily!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " Question: What have you seen represented as a vector in your previous experience with machine learning? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Answers:\n", "\n", "* words/test\n", "* images\n", "* gene sequences\n", "* speech & video\n", "* feature vectors\n", "* graphs\n", "* houses\n", "* stock prices!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Linear Maps\n", "\n", "We say a function$F$from a vector space$U$to a vector space$V$is a **linear map** if for any$x, y \\in U$and any$a \\in \\R$,\n", "\n", "$$F(ax + y) = a F(x) + F(y).$$\n", "\n", "* Notice that if we know$F(e_i)$for all the basis elements$e_i$of$U$, then this uniquely determines$F$(why?). \n", "* So, if we want to represent$F$on a computer and$U$and$V$are finite-dimensional vector spaces of dimensions$m$and$n$respectively, it suffices to store$F(e_1), F(e_2), \\ldots, F(e_m)$.\n", "* Each$F(e_i)$is itself an element of$V$, which we can represent on a computer as an array of$n$numbers (since$V$is$n$-dimensional).\n", "* So, we can represent$F$as an array of$m$arrays of$n$numbers...or equivalently as a **two-dimensional array**.\n", " * Sadly, this overloads the meaning of the term \"dimension\"...but usually the meaning is clear from context." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrices\n", "\n", "We call this two-dimensional-array representation of a linear map a **matrix**. Here is an example of a matrix in$\\R^{3 \\times 3}$\n", "\n", "$$A = \\begin{bmatrix}1 & 2 & 3 \\\\ 4 & 5 & 6 \\\\ 7 & 8 & 9 \\end{bmatrix}.$$\n", "\n", "We use multiplication to denote the effect of a matrix operating on a vector (this is equivalent to applying a multilinear map as a function). E.g. if$F$is the multilinear map corresponding to matrix$A$(really they are the same object, but I'm using different letters here to keep the notation clear), then\n", "\n", "$$y = F(x) \\;\\equiv\\; y = Ax.$$\n", "\n", "We can add two matrices, and scale a matrix by a scalar.\n", "\n", "* Note that this means that the set of matrices$\\R^{n \\times m}$**is itself a vector space**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrix Multiply\n", "\n", "If$A \\in \\R^{n \\times m}$is the matrix that corresponds to the linear map$F$, and$A_{ij}$denotes the$(i,j)$th entry of the matrix, then by our construction\n", "\n", "$$F(e_j) = \\sum_{i=1}^n A_{ij} e_i$$\n", "\n", "and so for any$x \\in \\R^m$\n", "\n", "$$F(x) = F\\left( \\sum_{j=1}^m x_j e_j \\right) = \\sum_{j=1}^m x_j F( e_j ) \n", "= \\sum_{j=1}^m x_j \\sum_{i=1}^n A_{ij} e_i = \\sum_{i=1}^n \\left( \\sum_{j=1}^m A_{ij} x_j \\right) e_i.$$\n", "\n", "So, this means that the$i$th entry of$F(x)$will be\n", "\n", "$$(F(x))_i = \\sum_{j=1}^m A_{ij} x_j.$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Matrices in Python\n", "\n", "A direct implementation of our matrix multiply formula:\n", "$$(F(x))_i = \\sum_{j=1}^m A_{ij} x_j.$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = [1. 2. 3.]\n", "A = [[1. 2. 3.]\n", " [4. 5. 6.]]\n", "Ax = [14. 32.]\n" ] } ], "source": [ "x = numpy.array([1.0,2.0,3.0])\n", "A = numpy.array([[1.0,2,3],[4,5,6]])\n", "\n", "def matrix_multiply(A, x):\n", " (n,m) = A.shape\n", " assert(m == x.size)\n", " y = numpy.zeros(n)\n", " for i in range(n):\n", " for j in range(m):\n", " y[i] += A[i,j] * x[j]\n", " return y\n", "\n", "print('x = {}'.format(x))\n", "print('A = {}'.format(A))\n", "print('Ax = {}'.format(matrix_multiply(A,x)))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ax = [14. 32.]\n" ] } ], "source": [ "# numpy has its own built-in support for matrix multiply\n", "print('Ax = {}'.format(A @ x)) # numpy uses @ to mean matrix multiply" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Using numpy buys us performance!\n", "\n", "Comparing numpy matrix multiplies with my naive for-loop matrix multiply, one is much faster than the other." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my matrix multiply: 3.1413068771362305 seconds\n", "numpy matmul: 0.0011799335479736328 seconds\n", "numpy was 2662x faster\n" ] } ], "source": [ "# generate some random data\n", "x = numpy.random.randn(1024)\n", "A = numpy.random.randn(1024,1024)\n", "\n", "t = time.time()\n", "for trial in range(5):\n", " B = matrix_multiply(A,x)\n", "my_time = time.time() - t\n", "print('my matrix multiply: {} seconds'.format(my_time))\n", "\n", "t = time.time()\n", "for trial in range(5):\n", " B = A @ x\n", "np_time = time.time() - t\n", "print('numpy matmul: {} seconds'.format(np_time))\n", "\n", "print('numpy was {:.0f}x faster'.format(my_time/np_time))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ " Question: What have you seen represented as a matrix in your previous experience with machine learning? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Answers:\n", "\n", "* the weights in a layer of a neural network\n", "* parameters\n", "* tables\n", "* a whole dataset\n", "* image\n", "* geometric transformation (physics, computer graphics)\n", "* PCA\n", "* covariance matrices" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Multiplying Two Matrices\n", "\n", "\n", "\n", "We can also multiply two matrices, which corresponds to function composition of linear maps.\n", "\n", "* Of course, this only makes sense if the dimensions match!\n", "* For example, if$A \\in \\R^{n \\times m}$and$B \\in \\R^{q \\times p}$, then it only makes sense to write$AB$if$m = q$.\n", "* In this context, we often want to think of a vector$x \\in \\R^d$as a$d \\times 1$matrix.\n", "\n", "One special matrix is the **identity matrix**$I$, which has the property that$Ix = x$for any$x$." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "u = [1. 2. 3.]\n", "A = [[ 2.49235899 -0.79169829 -1.15330389 ... 1.0745381 -0.4371076\n", " 0.05276692]\n", " [-1.22523154 -0.63926429 1.39791807 ... 0.42955508 0.44796394\n", " 0.03866751]\n", " [-1.62111808 -0.91904186 0.71170734 ... 2.82345617 -0.17558477\n", " -0.08574408]\n", " ...\n", " [ 2.45025228 -0.66925759 0.88846616 ... 0.54732669 -0.39968432\n", " 1.22050626]\n", " [-0.61742792 0.05118947 0.65439684 ... -0.37513861 0.82803545\n", " -0.97951277]\n", " [ 0.20119055 -0.76758025 0.17296537 ... -1.52676426 0.30153432\n", " -0.69490939]]\n", "B = [[ 3. 8. 1.]\n", " [-7. 2. -1.]\n", " [ 0. 2. -2.]]\n", "I = [[1. 0. 0.]\n", " [0. 1. 0.]\n", " [0. 0. 1.]]\n", "A.shape = (1024, 1024)\n", "B.shape = (3, 3)\n", "Iu = [1. 2. 3.]\n" ] }, { "ename": "ValueError", "evalue": "matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1024)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'B.shape = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mB\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Iu = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mI\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mu\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# numpy uses @ to mean matrix multiply\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Au = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mA\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mu\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'AB = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mA\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mB\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'BA = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mB\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mA\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# should cause an error!\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1024)" ] } ], "source": [ "B = numpy.array([[3.0,8,1],[-7,2,-1],[0,2,-2]])\n", "I = numpy.eye(3) # identity matrix\n", "\n", "print('u = {}'.format(u))\n", "print('A = {}'.format(A))\n", "print('B = {}'.format(B))\n", "print('I = {}'.format(I))\n", "print('A.shape = {}'.format(A.shape))\n", "print('B.shape = {}'.format(B.shape))\n", "print('Iu = {}'.format(I @ u)) # numpy uses @ to mean matrix multiply\n", "print('Au = {}'.format(A @ u))\n", "print('AB = {}'.format(A @ B))\n", "print('BA = {}'.format(B @ A)) # should cause an error!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Transposition\n", "\n", "Transposition takes a$n \\times m$matrix and **swaps the rows and columns** to produce an$m \\times n$matrix. Formally,\n", "\n", "$$(A^T)_{ij} = A_{ji}.$$\n", "\n", "A matrix that is its own transpose (i.e.$A = A^T$) is called a **symmetric matrix**.\n", "\n", "We can also transpose a vector. Transposing a vector$x \\in \\R^d$gives a matrix in$\\R^{1 \\times d}$, also known as a **row vector**. This gives us a handy way of defining the **dot product** which maps a pair of vectors to a scalar.\n", "\n", "$$x^T y = y^T x = \\langle x, y \\rangle = \\sum_{i=1}^d x_i y_i$$\n", "\n", "* This is very useful in machine learning to express similarities, make predictions, compute norms, etc.\n", "* It also gives us a handy way of grabbing the$i$th element of a vector, since$x_i = e_i^T x$(and$A_{ij} = e_i^T A e_j$).\n", "\n", "* A very useful identity: in$\\R^d$,$\\sum_{i=1}^d e_i e_i^T = I$." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A = [[1 2 3]\n", " [4 5 6]]\n", "A.T = [[1 4]\n", " [2 5]\n", " [3 6]]\n", "u = [1. 2. 3.]\n", "u.T * u = 14.0\n" ] } ], "source": [ "A = numpy.array([[1,2,3],[4,5,6]])\n", "print('A = {}'.format(A))\n", "print('A.T = {}'.format(A.T))\n", "print('u = {}'.format(u))\n", "print('u.T * u = {}'.format(u.T @ u))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Elementwise Operations\n", "\n", "Often, we want to express some mathematics that goes beyond the addition and scalar multiplication operations in a vector space. Sometimes, to do this we use **elementwise operations** which operate on a vector/matrix (or pair of vectors/matrices) on a per-element basis. E.g. if\n", "\n", "$$x = \\begin{bmatrix}1 \\\\ 4 \\\\ 9 \\\\ 16\\end{bmatrix},$$\n", "\n", "then if$\\operatorname{sqrt}$operates elementwise,\n", "\n", "$$\\operatorname{sqrt}(x) = \\begin{bmatrix}1 \\\\ 2 \\\\ 3 \\\\ 4\\end{bmatrix}.$$\n", "\n", "We can also do this with matrices and with binary operations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Elementwise Operations in Python\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = [1. 4. 9.]\n", "y = [2 5 3]\n", "z = [2 3 7 8]\n", "sqrt(x) = [1. 2. 3.]\n", "x * y = [ 2. 20. 27.]\n", "x / y = [0.5 0.8 3. ]\n" ] }, { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (3,) (4,) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'x * y = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# simple numerical operations are elementwise by default in numpy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'x / y = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'x * z = {}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mz\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# should cause error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,) (4,) " ] } ], "source": [ "x = numpy.array([1.0,4,9])\n", "y = numpy.array([2,5,3])\n", "z = numpy.array([2,3,7,8])\n", "\n", "print('x = {}'.format(x))\n", "print('y = {}'.format(y))\n", "print('z = {}'.format(z))\n", "print('sqrt(x) = {}'.format(numpy.sqrt(x)))\n", "print('x * y = {}'.format(x * y)) # simple numerical operations are elementwise by default in numpy\n", "print('x / y = {}'.format(x / y))\n", "print('x * z = {}'.format(x * z)) # should cause error" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The Power of Broadcasting\n", "\n", "We just saw that we can't use elementwise operations on pairs of vectors/matrices if they are not the same size. **Broadcasting** allows us to be more expressive by automatically expanding a vector/matrix along an axis of dimension 1." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = [2. 3.]\n", "A = [[1. 2.]\n", " [3. 4.]]\n", "A + x = [[3. 5.]\n", " [5. 7.]]\n", "A * x = [[ 2. 6.]\n", " [ 6. 12.]]\n" ] }, { "data": { "text/plain": [ "13.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = numpy.array([2.0,3])\n", "A = numpy.array([[1.,2],[3,4]])\n", "\n", "print('x = {}'.format(x))\n", "print('A = {}'.format(A))\n", "print('A + x = {}'.format(A + x)) # adds 2 to the first column of A and 3 to the second\n", "print('A * x = {}'.format(A * x)) # DO NOT MIX THIS UP WITH MATRIX MULTIPLY!\n", "\n", "numpy.dot(x,x)\n", "numpy.matmul(x,x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Tensors\n", "\n", "We say that a matrix is stored as a 2-dimensional array.\n", "A tensor generalizes this to a matrix of whatever dimension you want.\n", "\n", "From a mathematical perspective, a tensor is a **multilinear map** in the same way that a matrix is a linear map. That is, it's equivalent to a function\n", "\n", "$$F(x_1, x_2, \\ldots, x_n) \\in \\R$$\n", "\n", "where$F$is linear in each of the inputs$x_i \\in \\R^{d_i}$taken individually (i.e. with all the other inputs fixed).\n", "\n", "$$F\\left(\\begin{bmatrix} x_1 \\\\ y_1 \\end{bmatrix}, \\begin{bmatrix} x_2 \\\\ y_2 \\end{bmatrix}, \\begin{bmatrix} x_3 \\\\ y_3 \\end{bmatrix} \\right) = x_1 y_2 x_3.$$\n", "\n", "_We'll come back to this later when we discuss tensors in ML frameworks._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# An Illustrative Example\n", "\n", "Suppose that we have$n$websites, and we have collected a matrix$A \\in \\R^{n \\times n}$, where$A_{ij}$counts the number of links from website$i$to website$j$.\n", "\n", "We want to produce a new matrix$B \\in \\R^{n \\times n}$such that$B_{ij}$measures the _fraction_ of links from website$i$that go to website$j$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "_How do we compute this?_\n", "\n", "$$B_{ij} = \\frac{A_{ij}}{\\sum_{k=1}^n A_{ik} 1}$$" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]]\n", "[[0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 0.]]\n" ] } ], "source": [ "# generate some random data to work with\n", "n = 6\n", "A = numpy.random.randint(0,6,(n,n))**2 + numpy.random.randint(0,5,(n,n))\n", "\n", "B_for = numpy.zeros((n,n))\n", "for i in range(n):\n", " for j in range(n):\n", " acc = 0\n", " for k in range(n):\n", " acc += A[i,k]\n", " B_for[i,j] = A[i,j] / acc\n", "\n", "B_for\n", "\n", "print(B_for - (A / numpy.sum(A, axis=1, keepdims=True)))\n", "\n", "sumAik = A @ numpy.ones((n,1))\n", "print(B_for - (A / sumAik))\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Gradients\n", "\n", "Many, if not most, machine learning training algorithms use gradients to optimize a function.\n", "\n", "_What is a gradient?_\n", "\n", "Suppose I have a function$f$from$\\R^d$to$\\R$. The gradient,$\\nabla f$, is a function from$\\R^d$to$\\R^d$such that\n", "\n", "$$\\left(\\nabla f(w) \\right)_i = \\frac{\\partial}{\\partial w_i} f(w) = \\lim_{\\delta \\rightarrow 0} \\frac{f(w + \\delta e_i) - f(w)}{\\delta},$$\n", "\n", "that is, it is the **vector of partial derivatives of the function**.\n", "Another, perhaps cleaner (and basis-independent), definition is that$f(w)^T$is the linear map such that for any$u \\in \\R^d$\n", "\n", "$$f(w)^T u = \\lim_{\\delta \\rightarrow 0} \\frac{f(w + \\delta u) - f(w)}{\\delta}.$$\n", "\n", "More informally, it is the unique vector such that$f(w) \\approx f(w_0) + (w - w_0)^T \\nabla f(w_0)$for$w$nearby$w_0$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Let's derive some gradients!\n", "\n", "$f(x) = x^T A x$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$f(x) = \\| x \\|_2^2 = \\sum_{i=1}^d x_i^2$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$f(x) = \\| x \\|_1 = \\sum_{i=1}^d | x_i |$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$f(x) = \\| x \\|_{\\infty} = \\max(|x_1|, |x_2|, \\ldots, |x_d|)\$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Takeaway: numpy gives us powerful capabilities to express numerical linear algebra...\n", "\n", "**...and you should become skilled in mapping from mathematical expressions to numpy and back.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "rise": { "scroll": true, "transition": "none" } }, "nbformat": 4, "nbformat_minor": 2 }