http://www.cs.cornell.edu/~asampson//Adrian Sampson2017-02-23T19:03:43+00:00Adrian Sampsonhttp://www.cs.cornell.edu/~asampson/http://www.cs.cornell.edu/~asampson/blog/statsmistakes.htmlStatistical Mistakes and How to Avoid Them2016-11-23T00:00:00+00:00<p>Computer scientists in systemsy fields, myself included, aren’t great at using statistics. Maybe it’s because there are so many other potential problems with empirical evaluations that solid statistical reasoning doesn’t seem that important. Other subfields, like HCI and machine learning, have much higher standards for data analysis. Let’s learn from their example.</p>
<p>Here are three kinds of avoidable statistics mistakes that I notice in published papers.</p>
<h3 id="no-statistics-at-all">No Statistics at All</h3>
<p>The most common blunder is not using statistics at all when your paper clearly uses statistical data. If your paper uses the phrase “we report the average time over 20 runs of the algorithm,” for example, you should probably use statistics.</p>
<p>Here are two easy things that every paper should do when it deals with performance data or anything else that can randomly vary:</p>
<p>First, plot the error bars. In every figure that represents an average, compute the <a href="https://www.r-bloggers.com/standard-deviation-vs-standard-error/">standard error of the mean</a> or just the plain old standard deviation and add little whiskers to each bar. Explain what the error bars mean in the caption.</p>
<p><img src="/~asampson/media/errorbars.svg" alt="(a) Just noise. (b) Meaningful results. (c) Who knows???" class="img-responsive" style="width: 100%;" /></p>
<p>Second, do a simple statistical test. If you ever say “our system’s average running time is X seconds, which is less than the baseline running time of Y seconds,” you need show that the difference is <a href="https://en.wikipedia.org/wiki/Statistical_significance">statistically significant</a>. Statistical significance tells the reader that the difference you found was more than just “in the noise.”</p>
<p>For most CS papers I read, a really basic test will work: <a href="http://stattrek.com/hypothesis-test/difference-in-means.aspx?Tutorial=AP">Student’s $t$-test</a> checks that two averages that look different actually are different. The process is easy. Collect some $N$ samples from the two conditions, compute the mean $\overline{X}$ and the standard deviation $s$ for each, and plug them into this formula:</p>
<p>\[
t =
\frac{ \overline{X}_1 - \overline{X}_2 }
{ \sqrt{ \frac{s_1^2}{N_1} +
\frac{s_2^2}{N_2} } }
\]</p>
<p>then plug that $t$ into <a href="https://en.m.wikipedia.org/wiki/Student%27s_t-distribution">the cumulative distribution function of the $t$-distribution</a> to get a $p$-value. If your $p$-value is below a threshold $\alpha$ that you chose ahead of time (0.05 or 0.01, say), then you have a statistically significant difference. Your favorite numerical library probably already has <a href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html">an implementation</a> that does all the work for you.</p>
<p>If you’ve taken even an intro stats course, you know all this already! But you might be surprised to learn how many computer scientists don’t. Program committees don’t require that papers use solid statistics, so the literature is full of statistics-free but otherwise-good papers, so standards remain low, and Prof. Ouroboros keeps drawing figures without error bars. Other fields are <a href="http://www.nature.com/news/psychology-journal-bans-p-values-1.17001">moving <em>beyond</em> the $p$-value</a>, and CS isn’t even there yet.</p>
<h3 id="failure-to-reject--confirmation">Failure to Reject = Confirmation</h3>
<p>When you do use a statistical test in a paper, you need to interpret its results correctly. When your test produces a $p$-value, here are the correct interpretations:</p>
<ul>
<li>If $p < \alpha$: The difference between our average running time and the baseline’s average running time is statistically significant. Pedantically, we <em>reject the null hypothesis</em> that says that the averages might be the same.</li>
<li>Otherwise, if $p \ge \alpha$: We conclude nothing at all. Pedantically, we <em>fail to reject</em> that null hypothesis.</li>
</ul>
<p>It’s tempting to think, when $p \ge \alpha$, that you’ve found the opposite thing from the $p < \alpha$ case: that you get to conclude that there is <em>no statistically significant difference</em> between the two averages. Don’t do that!</p>
<p>Simple statistical tests like the $t$-test only tell you when averages are different; they can’t tell you when they’re the same. When they fail to find a difference, there are two possible explanations: either there is no difference or you haven’t collected enough data yet. So when a test fails, it could be your fault: if you had run a slightly larger experiment with a slightly larger $N$, the test might have successfully found the difference. It’s always wrong to conclude that the difference does not exist.</p>
<p>If you want to claim that two means are <em>equal</em>, you’ll need to use a different test where the null hypothesis says that they differ by at least a certain amount. For example, an appropriate <a href="http://stattrek.com/hypothesis-test/difference-in-means.aspx?Tutorial=AP">one-tailed $t$-test</a> will do.</p>
<h3 id="the-multiple-comparisons-problem">The Multiple Comparisons Problem</h3>
<p>In most ordinary evaluation sections, it’s probably enough to use only a handful of statistical tests to draw one or two bottom-line conclusions. But you might find yourself automatically running an unbounded number of comparisons. Perhaps you have $n$ benchmarks, and you want to compare the running time <em>on each one</em> to a corresponding baseline with a separate statistical test. Or maybe your system works in a feedback loop: it tries one strategy, performs a statistical test to check whether the strategy worked, and starts over with a new strategy otherwise.</p>
<p>Repeated statistical tests can get you into trouble. The problem is that every statistical test has a probability of lying to you. The probability that any <em>single</em> test is wrong is small, but if you do lots of test, the probability amplifies quickly.</p>
<p>For example, say you choose $\alpha = 0.05$ and run one $t$-test. When the test succeeds—when it finds a significant difference—it’s telling you that there’s at most an $\alpha$ chance that the difference arose from random chance. In 95 out of 100 parallel universes, your paper found a difference that actually exists. I’d take that bet.</p>
<p>Now, say you run a series of $n$ tests in the scope of one paper. Then every test has an $\alpha$ chance of going wrong. The chances that your paper has more than $k$ errors in it is given by the binomial distribution:</p>
<p>\[
1 - \sum_{i=0}^{k} {n \choose i} \alpha^i (1-\alpha)^{n-i}
\]</p>
<p>which grows exponentially with the number of tests, $n$. If you use just 10 tests with $\alpha = 0.05$, for example, your chance of having one test go wrong grows to 40%. If you do 100, the probability is above 99%. At that point, it’s a near certainty that your paper is misreporting some result.</p>
<p>(To compute these probabilities yourself, set $k = 0$ so you get the chance of at least one error. Then the CDF above simplifies down to $1 - (1 - \alpha) ^ n$.)</p>
<p>This pitfall is called the <a href="https://en.wikipedia.org/wiki/Multiple_comparisons_problem">multiple comparisons problem</a>. If you really need to run lots of tests, all is not lost: there are standard ways to compensate for the increased chance of error. The simplest are the <a href="http://mathworld.wolfram.com/BonferroniCorrection.html">Bonferroni</a> and <a href="https://en.m.wikipedia.org/wiki/Šidák_correction">Šidák</a> corrections, where you reduce your per-test $\alpha$ to $\frac{\alpha}{n}$ to preserve an overall $\alpha$ chance of going wrong.</p>
<p>You can get CS papers published with shoddy statistics, but that doesn’t mean you should. Here are three easy ways to bungle the data analysis in your evaluation section: don’t even try to use statistics when you really ought to; misinterpret an inconclusive statistical test as concluding a negative; or run too many tests without considering that some of them might be lying to you. I’ve seen all three of these mistakes in multiple published papers—don’t let this be you!</p>
http://www.cs.cornell.edu/~asampson/blog/probablycorrect.htmlProbably Correct2016-06-15T00:00:00+00:00<p>How do you know whether a program is good enough if it’s allowed to be wrong some of the time?</p>
<p>Say, for example, that you want to use <a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root">Quake III’s famous inverse square root approximation</a>.
The approximation is closer to $x^{-1/2}$ for some inputs $x$ and farther away for others.
You’ll want to know the chances that the approximation is close enough for the $x$s you care about.</p>
<p>When your program can only be right some of the time, it’s important to take a statistical view of correctness.
This is not just about squirrelly floating-point hacks: probably-correct programs are ubiquitous, from <a href="http://www.apple.com/ios/siri/">Siri</a> to <a href="https://www.teslamotors.com/presskit/autopilot">Tesla’s autopilot</a>.
This post is about infusing statistics into the ways we define correctness and the everyday tools we use to enforce it, like unit testing.
We’ll explore two simple but solid approaches to enforcing statistical correctness.
The first is an analogy to traditional testing, and the second moves checking to run time for a stronger guarantee.
Both require only Wikipedia-level statistics to understand and implement.</p>
<p>At the end, I’ll argue that these basic approaches are deceptively difficult to beat.
If we want to make stronger guarantees about probably-correct programs, we’ll need more creative ideas.</p>
<h2 id="correct-vs-probably-correct">Correct vs. Probably Correct</h2>
<p>First, let’s recap traditional definitions of correctness.
With ordinary, hopefully-always-correct programs, the ultimate goal is <strong>verification</strong>:</p>
<p>\[ \forall x \; f(x) \text{ is good} \]</p>
<p>The word <em>good</em> is intentionally vague: it might mean something about the output $f$ writes to a file, or about how fast $f$ runs, or whether $f$ violated some security policy.
In any case, verification says your program behaves well on every input.</p>
<p>Verification is hard, so we also have <strong>testing</strong>, which says a program behaves well on a few example inputs:</p>
<p>\[ \forall\; x \in X \; f(x) \text{ is good} \]</p>
<p>Testing tells us a set of inputs $X$ all lead to good behavior.
It doesn’t imply $\forall x$ anything, but it’s something.</p>
<p>For this post, we’ll assume $f$ is good on some inputs and bad on others, but it doesn’t fail at random.
In other words, it’s <em>deterministic:</em> for a given $x$, running $f(x)$ is either always good or always bad.
The <a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root">fast inverse square root</a> function is one example: the error is below $10^{-4}$ for most inputs, but it can be as high as $0.04$ for reasonably small values of $x$.
(See for yourself with this <a href="https://gist.github.com/sampsyo/c1ed448618dadce682fdc5303ce432ec">Python implementation</a>.)
If you know your threshold for a good-enough inverse square root is an error of 0.01, you’ll want to know your chances of violating that bound.</p>
<p>Nondeterministically correct programs are also important, of course, but there the goal is to show something more complicated: something like $\forall x \; \text{Pr}\left[ f(x) \text{ is good} \right] \ge T$.
This post focuses on deterministic programs.</p>
<h2 id="statistical-testing">Statistical Testing</h2>
<p>There’s an easy way to get a basic kind of statistical correctness.
It’s roughly equivalent to traditional testing in terms of both difficulty and strength, so I’ll call it <strong>statistical testing</strong>.
(But to be clear, this is not my invention.)</p>
<p>The idea is to pick, instead of a set $X$ of representative inputs, a <a href="https://en.wikipedia.org/wiki/Probability_distribution">probability distribution</a> $D$ of inputs that you think is representative of real-world behavior.
For the fast inverse square root function, for example, we might pick a uniform distribution between 0.0 and 10.0, suggesting that any input in that range is equally likely.</p>
<p>Statistical testing can show, with high confidence, when you randomly choose an $x$ from the input distribution $D$, it has a high probability of making $f(x)$ good.
In other words, your goal is to show:</p>
<p>\[ \text{Pr}_{x \sim D} \left[ f(x) \text{ is good} \right] \ge T \]</p>
<p>with confidence $\alpha$.
Your <a href="https://en.wikipedia.org/wiki/Confidence_interval">confidence</a> parameter helps you decide how much evidence to collect—instead of proving that statement absolutely, we’ll say that we have observed enough evidence that there’s only an $\alpha$ chance we observed a random fluke.</p>
<p>Let $p = \text{Pr}_{x \sim D} \left[ f(x) \text{ is good} \right]$ be the <em>correctness probability</em> for $f$.
Our goal is to check whether $p \ge T$, our threshold for <em>good enough</em>.
Here’s the complete recipe:</p>
<ol>
<li>Pick your input distribution $D$.</li>
<li>Randomly choose $n$ inputs $x$ according to $D$. (This is called <a href="https://en.wikipedia.org/wiki/Sampling_(statistics)">sampling</a>.)</li>
<li>Run $f$ on each sampled $x$ and check whether each $f(x)$ is good.</li>
<li>Let $g$ be the number of good runs. Now, $\hat{p} = \frac{g}{n}$ is your estimate for $p$.</li>
<li>Perform some light statistics magic.</li>
</ol>
<p>There are a few ways to do the statistics. Here’s a really simple way: use a <a href="https://en.m.wikipedia.org/wiki/Binomial_proportion_confidence_interval">confidence interval formula</a> to get upper and lower bounds on $p$.
The <a href="https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper-Pearson_interval">Clopper–Pearson</a> formula, for example, gives you a $p_{\text{low}}$ and $p_{\text{high}}$ so that:</p>
<p>\[ \text{Pr}\left[ p_{\text{low}} \le p \le p_{\text{high}} \right] \ge 1 - \alpha \]</p>
<p>Remember that $\alpha$ is small, so you’re saying that it’s likely you have an interval around $p$.
If $p_{\text{low}} \ge T$, then you can say with confidence $\alpha$ that $f$ is good on the input distribution $D$.
If $p_{\text{high}} \le T$, then you can say it’s wrong.
Otherwise, the test is inconclusive—you need to take more samples.
Collecting more samples (increasing $n$) tightens the interval; demanding higher confidence (decreasing $\alpha$) loosens the interval.</p>
<p>There are fancier ways, too: you could use <a href="https://en.wikipedia.org/wiki/Sequential_probability_ratio_test">Wald’s sequential sampling</a> to automatically choose $n$ and rule out possibility of an inconclusive result.
But the simple Clopper–Pearson way is perfectly good, and it’s easy to implement: here it is in <a href="https://gist.github.com/sampsyo/c073c089bde311a6777313a4a7ac933e">four lines of Python</a>.</p>
<p>The statistical testing technique is so simple that it, or something at least as strong, should appear in every paper that proposes a new approximation strategy.
It doesn’t require any fancy computer science: all you need to do is run $f$ as a black box and check its output, just like in traditional testing.
Our <a href="http://dx.doi.org/10.1145/2594291.2594294">probabilistic assertions</a> checker uses some fanciness to make the approach more efficient, but these tricks aren’t necessary to perform a statistically sound test.
So if you read an <a href="/~asampson/research.html#approximate-computing">approximate computing</a> paper that doesn’t report its $\alpha$, be suspicious.</p>
<h3 id="limitations">Limitations</h3>
<p>Statistical testing is limited by its need for an input distribution, $D$.
That requirement makes statistical testing’s guarantee about as strong as traditional testing is for normal programs:
it says that your program behaves itself under specific conditions that you anticipate in development.
It doesn’t say anything about what will happen when your program meets the real world—there are no guarantees for any input distribution other than $D$.</p>
<p>More subtly, statistical testing also requires that you have a $D$ that you can generate random samples from.
This makes it tricky to use, for example, if your $f$ is an image classifier that works on photographs that users upload to a Web service—it’s hard to randomly generate photos from scratch!
You could sample from a pool of test photos, but that will only let you draw conclusions about those test photos—not the distribution of photos that users might upload.</p>
<p>Statistical testing is useful when you can anticipate the input distribution ahead of time.
Is it possible to make statements that don’t depend on a known, sample-able distribution?</p>
<h2 id="going-on-line-statistical-checking">Going On-Line: Statistical Checking</h2>
<p>A stronger guarantee could help us cope with unanticipated distributions—even <em>adversarial</em> distributions.
For example, a user might find
a single $x_\text{bad}$ input that your program doesn’t handle well and then issue a probability distribution $D_\text{bad}$ that hammers on that one $x_\text{bad}$ over and over.
Statistical testing will never help with adversarial input distributions, but some form of on-line enforcement might.</p>
<p>Let’s explore a simple on-line variant of statistical testing, which I’ll call <strong>statistical checking</strong>, and consider how its guarantees stack up against adversarial input distributions.
The idea is that you have an oracle that can decide whether a given execution $f(x)$ is good or bad, but it’s too expensive to run on <em>every</em> execution.
For example, you can always check the <a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root">fast inverse square root</a> output by comparing with an exact $x^{-1/2}$ computation, but that would obviate all the efficiency benefits of using the approximation in the first place.
Statistical checking reduces the overhead by running the oracle after a random sample of executions.</p>
<p>Say you run $f$ on a server for a full day and, at the end of the day, you want to know how many of the requests were good.
Let $p$ be the probability that an execution on that day is good: in expectation, $p$ is also the fraction of good executions.
Again, we hope $p$ will be high.
Here’s the statistical checking recipe:</p>
<ol>
<li>Choose a probability $p_\text{check}$ that you’ll use to decide whether to check each execution.</li>
<li>After running $f(x)$ each time, flip a biased coin that comes up heads with probability $p_\text{check}$. If it’s heads, pay the expense to check whether $f(x)$ is good; otherwise, do nothing.</li>
<li>At the end of the day, tally up the number of times you checked, $c$, and the number of times the check came out good, $g$. Now, $\hat{p} = \frac{g}{c}$ is your estimate for $p$.</li>
<li>Use the same statistical magic as last time to get a confidence interval on $p$.</li>
</ol>
<p>The same binomial confidence interval techniques that we used for statistical testing, like Clopper–Pearson, work here too.
And if you want to do the statistics multiple times, like at the end of <em>every</em> day or even after each execution you randomly check, you can again use <a href="https://en.wikipedia.org/wiki/Sequential_probability_ratio_test">Wald’s sequential sampling</a> to avoid the <a href="https://en.wikipedia.org/wiki/Multiple_comparisons_problem">multiple comparisons problem</a>.</p>
<p>The guarantees are similar: you again get an $\alpha$-confidence interval on $p$ that lets you decide whether you have enough evidence to conclude that the day’s executions were good enough or not.
The $p_\text{check}$ knob lets you pay more overhead for a better shot at a conclusive outcome in either direction.</p>
<p>Like random screening in the customs line, randomly choosing the executions to check is the key to defeating adversarial distributions.
This way, your program’s adversary can <em>provably</em> have no idea which executions will be checked—it has nowhere to hide.
Any non-random strategy, such as <a href="https://en.wikipedia.org/wiki/Exponential_backoff">exponential backoff</a>, admits some adversary that behaves well only on checked executions.
(This <a href="/~asampson/blog/naivemonitoring.html">old post with pictures</a> gets at the same idea.)</p>
<h2 id="even-stronger-statements">Even Stronger Statements</h2>
<p>Statistical testing and statistical checking, as simple as they are, yield surprisingly good guarantees.
Is it possible to do even better?</p>
<p>In particular, neither sampling-based technique can say anything about worst-case errors.
We can know with high confidence that 99% of executions are good enough, for example, but we can’t know <em>how</em> bad that remaining 1% might be.
We could check looser bounds, but sampling will never get us to 100% certainty about anything: there’s always a chance we got unlucky and failed to see a particularly bad $x_\text{bad}$.</p>
<p>A worst-case guarantee is deceptively difficult to certify.
I can only see two ways that might work:</p>
<ul>
<li>Conservatively identify <em>all</em> (not just most) of the bad $x$s for $f$ and detect them at run time.</li>
<li>Derive a cheap-enough oracle that can dynamically check <em>every</em> execution for correctness.</li>
</ul>
<p>Both options are hard!
And they amount to recovering complete correctness—anything less than perfection risks missing a single outlier $x_\text{bad}$.
Getting a guarantee that’s stronger than simple statistical checking will take real creativity.</p>
<h2 id="heuristics-cant-beat-statistical-testing">Heuristics Can’t Beat Statistical Testing</h2>
<p>One approach that <em>can’t</em> beat the simple techniques is an on-line heuristic.
Here’s the usual line of reasoning:</p>
<blockquote>
<p>Statistical testing is weak because it only knows about inputs we anticipated <em>in vitro</em>.
And statistical checking is weak because it only looks at some of the inputs at run time.
To do better, let’s check <em>every</em> execution!
Just before running $f$ on $x$, or just after getting the output $f(x)$, apply some heuristic to predict whether the execution is good or bad.
The heuristic will statistically avoid bad behavior, so we’ll get a stronger guarantee.</p>
</blockquote>
<p>Let’s call this general approach <strong>heuristic checking</strong>.
It’s “easy” because there’s no program analysis necessary: we still get to treat $f$ as a black box.
And the idea to check every run sounds like it might offer a stronger kind of guarantee.</p>
<p>It can’t.
Heuristics are orthogonal to statistical guarantees—you need some other technique, like statistical testing or checking, to make any rigorous statements about them.</p>
<p>The problem is that every heuristic has false positives.
Regardless of whether you choose a decision tree, a support vector machine, a neural network, or a fuzzy lookup table, your favorite heuristic necessarily has blind spots.
For example, you might try to train an SVM on lots of inputs to predict when a given $x$ will cause lots of error in your fast inverse square root approximation, $f$.
If the SVM predicts for a given $x$ that $f(x)$ will be bad, then run the slower fallback $x^{-1/2}$ code instead.</p>
<figure style="max-width: 200px;">
<img src="/~asampson/media/heuristiccheck.svg" alt="heuristic checks on inputs and outputs" />
<figcaption>Adding checks to an approximate program $f$ yields a new approximate program $f'$.</figcaption>
</figure>
<p>Like any trained model, the SVM will make an wrong prediction in some minority of the cases—in exactly the same way that the approximation itself is inaccurate some of the time.
That means that we can think of the entire SVM-augmented system as just another probably-correct program with all the same problems as the original $f$.
Let $f'$ be the function that runs the SVM predictor and then chooses to run $f$ or the accurate $x^{-1/2}$.
This new $f'$ you’ve created also has some $x_\text{bad}$ inputs and also needs some validation of its correctness, just as much as the original $f$.
You’ll still need to apply statistical testing, statistical checking, or something of their ilk to understand the correctness of $f'$.</p>
<p>In that sense, heuristic checking can never offer any statistical guarantees by itself—it’s <em>orthogonal</em> to the technique you use to assess statistical correctness.
Even the best heuristic can only adjust the correctness probability; it can’t change the <em>kind</em> of guarantee that’s possible.</p>
<p>That’s not to say that heuristic checking is useless.
It can definitely be a useful to empirically improve your program’s correctness probability; hence publications in <a href="https://homes.cs.washington.edu/~luisceze/publications/approxdebug-asplos15.pdf">ASPLOS 2015</a> (where I’m an author), <a href="http://cccp.eecs.umich.edu/papers/dskhudia-isca15.pdf">ISCA 2015</a>, <a href="http://dl.acm.org/citation.cfm?id=2872402">ASPLOS 2016</a>, <a href="http://dl.acm.org/citation.cfm?id=2908087">PLDI 2016</a>, and <a href="http://www.cc.gatech.edu/~ayazdanb/publication/papers/mithra-isca16.pdf">ISCA 2016</a>.
But we need to be clear about exactly what this kind of work can do: it can adjust the correctness probability $p$, but it can’t change the <em>kind</em> of guarantee you state about $p$.</p>
<p>Work along these lines needs to be careful to use the right baseline.
Enhancing an $f$ with heuristic checking is morally equivalent to using a more accurate $f$ in the first place.
You could, for example, change your fast inverse square root function to enable the second Newton iteration.
This would increase accuracy and increase cost—exactly the same effects as adding heuristic checking.
So if you design a new checking heuristic, remember to compare against other strategies for improving accuracy.</p>
<p>In my own <a href="https://homes.cs.washington.edu/~luisceze/publications/approxdebug-asplos15.pdf">ASPLOS 2015 paper</a>, for example, we used a fuzzy memoization table to detect approximate outputs that deviated too much from previously-observed behavior.
Our evaluation showed that the extra checking costs energy, but it also increases accuracy on average.
There were other, more obvious ways to change the energy–accuracy trade-off: we could have adjusted the hardware voltage parameters, for example, and ended up with the same strength of guarantee.
A good evaluation should treat the obvious strategy as a baseline: compare the total energy energy savings when the average accuracy is equal, or vice-versa.</p>
<p>Statistical correctness is a critical but underappreciated problem. Fortunately, basic statistics are enough to make pretty good statements about statistical correctness. But we’re far from done: there are juicy, unsolved, computer-sciencey problems remaining in meaningfully outperforming these basic tools.</p>
<hr />
<p><em>Thanks to Cyrus Rashtchian, Todd Mytkowicz, and Kathryn McKinley for unbelievably valuable feedback on earlier drafts of this post. Needless to say, they don’t necessarily agree with everything here.</em></p>
<p>Say you have a program that’s right only some of the time. How can you tell whether it’s correct enough? Using with some Wikipedia-level statistics, it’s pretty easy to make probabilistic statements about quality. I’ll explain two strategies for measuring statistical correctness. Then I’ll argue that it’s deceptively difficult produce guarantees that are any stronger than the ones you get from the basic techniques.</p>
http://www.cs.cornell.edu/~asampson/blog/opengl.htmlWeep for Graphics Programming2016-05-02T00:00:00+00:00<p>The mainstream real-time graphics APIs, OpenGL and Direct3D, are probably the most widespread way that programmers interact with heterogeneous hardware.
But their brand of CPU–GPU integration is unconscionable.
CPU-side code needs to coordinate closely with GPU-side shader programs for good performance, but the APIs we have today treat the two execution units as isolated universes.
This mindset leads to stringly typed interfaces, a huge volume of boilerplate, and impoverished GPU-specific programming languages.</p>
<p>This post tours a few gritty realities in a trivial OpenGL application.
You can follow along with <a href="http://adriansampson.net/doc/tinygl/">a literate listing</a> of the <a href="https://github.com/sampsyo/tinygl/blob/master/tinygl.c">full source code</a>.</p>
<h2 id="shaders-are-strings">Shaders are Strings</h2>
<p>To define an object’s appearance in a 3D scene, real-time graphics applications use <em><a href="https://en.wikipedia.org/wiki/Shader">shaders</a>:</em> small programs that run on the GPU as part of the rendering pipeline.
There are several <a href="https://en.wikipedia.org/wiki/Shader#Types">kinds</a> of shaders, but the two most common are the <a href="https://www.opengl.org/wiki/Vertex_Shader">vertex shader</a>, which determines the position of each vertex in an object’s mesh, and the <a href="https://www.opengl.org/wiki/Fragment_Shader">fragment shader</a>, which produces the color of each pixel on the object’s surface.
You write shaders in special C-like programming languages: OpenGL uses <a href="https://www.opengl.org/documentation/glsl/">GLSL</a>.</p>
<p>This is where things go wrong. To set up a shader, the host program sends a <em>string containing shader source code</em> to the graphics card driver.
The driver JITs the source to the GPU’s internal architecture and loads it onto the hardware.</p>
<p>Here’s a simplified pair of GLSL <a href="http://sampsyo.github.io/tinygl/#section-7">vertex and fragment shaders in C string constants</a>:</p>
<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">vertex_shader</span> <span class="o">=</span>
<span class="s">"in vec4 position;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"out vec4 myPos;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"void main() {</span><span class="se">\n</span><span class="s">"</span>
<span class="s">" myPos = position;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">" gl_Position = position;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"}</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">fragment_shader</span> <span class="o">=</span>
<span class="s">"uniform float phase;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"in vec4 myPos;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"void main() {</span><span class="se">\n</span><span class="s">"</span>
<span class="s">" gl_FragColor = ...;</span><span class="se">\n</span><span class="s">"</span>
<span class="s">"}</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
</code></pre>
</div>
<p>(It’s also common to load shader code from text files at startup time.)
Those <a href="https://www.opengl.org/wiki/Type_Qualifier_(GLSL)"><code class="highlighter-rouge">in</code>, <code class="highlighter-rouge">out</code>, and <code class="highlighter-rouge">uniform</code> qualifiers</a> denote communication channels between the CPU and GPU and between the different stages of the GPU’s rendering pipeline.
That <code class="highlighter-rouge">myPos</code> variable serves to shuffle data from through vertex shader into the fragment shader.
The vertex shader’s <code class="highlighter-rouge">main</code> function assigns to the magic <code class="highlighter-rouge">gl_Position</code> variable for its output, and the fragment shader assigns to <code class="highlighter-rouge">gl_FragColor</code>.</p>
<p>Here’s roughly how you <a href="http://sampsyo.github.io/tinygl/#section-18">compile and load the shader program</a>:</p>
<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="c1">// Compile the vertex shader.
</span><span class="n">GLuint</span> <span class="n">vshader</span> <span class="o">=</span> <span class="n">glCreateShader</span><span class="p">(</span><span class="n">GL_VERTEX_SHADER</span><span class="p">);</span>
<span class="n">glShaderSource</span><span class="p">(</span><span class="n">vshader</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">vertex_shader</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// Compile the fragment shader.
</span><span class="n">GLuint</span> <span class="n">fshader</span> <span class="o">=</span> <span class="n">glCreateShader</span><span class="p">(</span><span class="n">GL_FRAGMENT_SHADER</span><span class="p">);</span>
<span class="n">glShaderSource</span><span class="p">(</span><span class="n">fshader</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">fragment_shader</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// Create a program that stitches the two shader stages together.
</span><span class="n">GLuint</span> <span class="n">shader_program</span> <span class="o">=</span> <span class="n">glCreateProgram</span><span class="p">();</span>
<span class="n">glAttachShader</span><span class="p">(</span><span class="n">shader_program</span><span class="p">,</span> <span class="n">vshader</span><span class="p">);</span>
<span class="n">glAttachShader</span><span class="p">(</span><span class="n">shader_program</span><span class="p">,</span> <span class="n">fshader</span><span class="p">);</span>
<span class="n">glLinkProgram</span><span class="p">(</span><span class="n">shader_program</span><span class="p">);</span>
</code></pre>
</div>
<p>With that boilerplate, we’re ready to invoke <code class="highlighter-rouge">shader_program</code> to draw objects.</p>
<p>The shaders-in-strings interface is the original sin of graphics programming.
It means that some parts of the complete program’s semantics are unknowable until run time—for no reason except that they target a different hardware unit.
It’s like <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval"><code class="highlighter-rouge">eval</code> in JavaScript</a>, but worse: every OpenGL program is <em>required</em> to cram some of its code into strings.</p>
<p>Direct3D and the next generation of graphics APIs—<a href="http://www.amd.com/en-us/innovations/software-technologies/technologies-gaming/mantle">Mantle</a>, <a href="https://developer.apple.com/metal/">Metal</a>, and <a href="https://www.khronos.org/vulkan/">Vulkan</a>—clean up some of the mess by using a bytecode to ship shaders instead of raw source code.
But pre-compiling shader programs to an IR doesn’t solve the fundamental problem:
the <em>interface</em> between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.</p>
<h2 id="stringly-typed-binding-boilerplate">Stringly Typed Binding Boilerplate</h2>
<p>If string-wrapped shader code is OpenGL’s principal investment in pain,
then it collects its pain dividends via the CPU–GPU communication interface.</p>
<p>Check out those variables <code class="highlighter-rouge">position</code> and <code class="highlighter-rouge">phase</code> in the vertex and fragment shaders, respectively.
The <code class="highlighter-rouge">in</code> and <code class="highlighter-rouge">uniform</code> qualifiers mean they’re parameters that come from the CPU.
To use those parameters, the host program’s first step is to <a href="http://sampsyo.github.io/tinygl/#section-28">look up <em>location</em> handles</a> for each variable:</p>
<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="n">GLuint</span> <span class="n">loc_phase</span> <span class="o">=</span> <span class="n">glGetUniformLocation</span><span class="p">(</span><span class="n">program</span><span class="p">,</span> <span class="s">"phase"</span><span class="p">);</span>
<span class="n">GLuint</span> <span class="n">loc_position</span> <span class="o">=</span> <span class="n">glGetAttribLocation</span><span class="p">(</span><span class="n">program</span><span class="p">,</span> <span class="s">"position"</span><span class="p">);</span>
</code></pre>
</div>
<p>Yes, you look up the variable by passing its name as a string.
The <code class="highlighter-rouge">phase</code> parameter is just a <code class="highlighter-rouge">float</code> scalar, but <code class="highlighter-rouge">position</code> is a dynamically sized array of position vectors, so it requires even more boilerplate to <a href="http://sampsyo.github.io/tinygl/#section-34">set up a backing buffer</a>.</p>
<p>Next, we use these handles to <a href="http://sampsyo.github.io/tinygl/#section-42">pass new data to the shaders</a> to draw each frame:</p>
<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="c1">// The render loop.
</span><span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Bind our compiled shader program.
</span> <span class="n">glUseProgram</span><span class="p">(</span><span class="n">shader_program</span><span class="p">);</span>
<span class="c1">// Set the scalar `phase` variable.
</span> <span class="n">glUniform1f</span><span class="p">(</span><span class="n">loc_phase</span><span class="p">,</span> <span class="n">sin</span><span class="p">(</span><span class="mi">4</span> <span class="o">*</span> <span class="n">t</span><span class="p">));</span>
<span class="c1">// Set the `location` array by copying data into the buffer.
</span> <span class="n">glBindBuffer</span><span class="p">(</span><span class="n">GL_ARRAY_BUFFER</span><span class="p">,</span> <span class="n">buffer</span><span class="p">);</span>
<span class="n">glBufferSubData</span><span class="p">(</span><span class="n">GL_ARRAY_BUFFER</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">points</span><span class="p">),</span> <span class="n">points</span><span class="p">);</span>
<span class="c1">// Use these parameters and our shader program to draw something.
</span> <span class="n">glDrawArrays</span><span class="p">(</span><span class="n">GL_TRIANGLE_FAN</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">NVERTICES</span><span class="p">);</span>
<span class="p">}</span>
</code></pre>
</div>
<p>The <a href="http://sampsyo.github.io/tinygl/#section-42">verbosity</a> is distracting, but those <a href="https://www.khronos.org/opengles/sdk/docs/man/xhtml/glUniform.xml"><code class="highlighter-rouge">glUniform1f</code></a> and <a href="https://www.opengl.org/sdk/docs/man2/xhtml/glBufferSubData.xml"><code class="highlighter-rouge">glBufferSubData</code></a> calls are morally equivalent to
writing <code class="highlighter-rouge">set("variable", value)</code> instead of <code class="highlighter-rouge">let variable = value</code>.
The C and GLSL compilers can check and optimize the CPU and GPU code separately,
but the stringly typed CPU–GPU interface prevents either compiler from doing anything useful to the complete program.</p>
<h2 id="the-age-of-heterogeneity">The Age of Heterogeneity</h2>
<p>OpenGL and its equivalents make miserable standard bearers for the age of hardware heterogeneity.
Heterogeneity is rapidly becoming ubiquitous, and we need better ways to write software that spans hardware units with different capabilities.
OpenGL’s programming model espouses the simplistic view that heterogeneous software should comprise multiple, loosely coupled, independent programs.</p>
<p>If pervasive heterogeneity is going to succeed, we need to bury this 20th-century notion. We need programming models that let us write <em>one</em> program that spans multiple execution contexts.
This won’t erase heterogeneity’s essential complexity, but it will let us stop treating non-CPU code as a second-class citizen.</p>
<p>The mainstream real-time graphics APIs, OpenGL and Direct3D, make miserable standard bearers for the age of hardware heterogeneity.
Their approach to heterogeneous programming leads to stringly typed interfaces, a huge volume of boilerplate, and impoverished GPU-specific programming languages.</p>
http://www.cs.cornell.edu/~asampson/blog/wax2016.htmlNotes from WAX 20162016-04-17T00:00:00+00:00<p>We held <a href="http://approximate.computer/wax2016/">WAX</a>, the workshop on approximate computing, at <a href="https://www.ece.cmu.edu/calcm/asplos2016/">ASPLOS</a> last week. I love organizing WAX—it’s a great excuse for the approximation community to talk about the broader themes that extend beyond any single person’s research project <em>du jour</em>.</p>
<p>Here are some notes on those themes.
You can also check out <a href="http://approximate.computer/wax2016/program/">the archived program</a> for links to papers and slides.</p>
<h3 id="disgruntled-introductions">Disgruntled Introductions</h3>
<p>To introduce ourselves, we all said something we like about approximate computing and something we don’t like.
Predictably, this invited a healthy dose of griping.
Two gripey themes emerged:</p>
<ul>
<li>There’s some sadness that approximation hasn’t “hit it big” yet, commercially speaking. We’re a half decade or so into the approximate-computing craze, so it feels to many like we should see shipping hardware soon.</li>
<li>Our terminology is confusing. What does <em>quality</em> mean versus <em>accuracy</em> versus <em>quality of service?</em> A cohort even complained about <em>approximate computing</em> itself being more off-putting than alternatives like <em>inexact</em> or <em>good-enough</em> computing.</li>
</ul>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/wax2016.jpg" alt="looks like fun!" /></p>
<h3 id="cross-stack-keynotes">Cross-Stack Keynotes</h3>
<p>We had two awesome keynote speakers, both of whom brought broad, interdisciplinary views on approximation.</p>
<p><a href="http://research.microsoft.com/en-us/people/matthaip/">Matthai Philipose</a> from MSR has a goal of <em>continuous mobile vision:</em> always-on CV on a wearable device with all-day battery life and reasonable cloud costs.
His data suggests that approximation is critical—not just a luxury—for this setting: current vision techniques can’t fit in the necessary energy and dollar budgets. It’s not even close.
So he’s on a campaign to introduce approximation everywhere, from the camera sensor hardware to the DNN models and algorithms.</p>
<p><a href="http://ee.princeton.edu/people/faculty/naveen-verma">Naveen Verma</a> from Princeton is a hardware researcher but, unlike some architects, believes approximate computing should come from the top down, from algorithms.
He showed off <em>data-driven hardware resilience</em>, where you train a machine learning model to counteract the effects of deterministic hardware approximation.
Under the right conditions, this cross-stack approach can lead to extremely good tolerance—much more than algorithm-agnostic approximation.</p>
<h3 id="best-practices">Best Practices</h3>
<p>The discussion at the end of the day coalesced around standards of rigor in approximate-computing research.
There was a broad consensus that evaluation methodologies have not improved enough since those heady days of the first few approximation papers.
We hatched the idea of putting together a best practices document for approximation research, covering:</p>
<ul>
<li>Standard benchmarks with standard quality metrics and standard thresholds. When people are free to define their own quality metrics, there’s no way to compare two papers and no way to trust that an approximation is actually useful. The <em>de facto</em> 10% quality loss standard is my least favorite legacy of <a href="/~asampson/media/papers/enerj-pldi2011.pdf">the EnerJ paper</a>.</li>
<li>A map of the available approximation techniques. If you want to apply approximation to a bottleneck in your favorite application, where should you start? This kind of guide is common for traditional performance optimization, so we should have one too.</li>
<li>Agreement on what <em>kinds</em> of quality guarantees are worth striving for. Where do statistical guarantees make sense, and where do we need more traditional deterministic bounds?</li>
</ul>
<p>I’m excited about this idea for a community-sanctioned set of standards. But it’s going to be difficult: work like this doesn’t fit with normal incentives for academics.</p>
<h3 id="a-better-workshop-next-year">A Better Workshop Next Year</h3>
<p>I have plenty to learn about organizing workshops.
Here are some things we need to fix:</p>
<ul>
<li>One-minute lightning talks are a staple at WAX, but they need work. They’re supposed to be an effortless and fun way to provoke discussion, but people recently have put too much work into them—and a high standard means a low turnout. Even so, I heard reactions that one minute isn’t enough to communicate a whole idea. We need new ideas to keep the lightning round’s lighthearted spirit while making it more useful.</li>
<li>Several people told me that the free-form, small-group lunch discussions were their favorite part of WAX. This was doubly true for new people and outsiders, who got to ask questions that wouldn’t work in front of a whole-workshop audience. We need to create more of this kind of discussion, ideally by replacing the usual anemic post-talk Q&A. Maybe we can get seating at small round tables for the next WAX, or we could steal <a href="http://composition.al/">Lindsey Kuper</a>’s <a href="http://composition.al/blog/2016/01/25/off-the-beaten-track-2016-program-chairs-report/">card-based Q&A idea</a> from POPL OBT.</li>
<li>We need to remind speakers to skip their motivation slides for this venue.</li>
</ul>
<p>Next year, my co-chairs and I want to get the WAX franchise more organized. Haphazardly cobbling things together one year at a time has been fun, but the workshop is getting bigger and more serious. We should assign real roles, like <em>program chair</em> and <em>publicity chair</em>, which means we’ll need more help. Please <a href="mailto:asampson@cs.cornell.edu">get in touch</a> if you’d like to get involved—and thanks to everyone who already volunteered!</p>
<p>See you at WAX 2017.</p>
<p><a href="http://approximate.computer/wax2016/">WAX</a> is the workshop on approximate computing. This year at <a href="https://www.ece.cmu.edu/calcm/asplos2016/">ASPLOS</a>, I organized its third or fourth iteration, depending on how you count, along with <a href="http://homes.cs.washington.edu/~luisceze/">Luis Ceze</a>, <a href="http://www.cc.gatech.edu/~hadi/">Hadi Esmaeilzadeh</a>, and <a href="http://research.microsoft.com/en-us/people/zorn/">Ben Zorn</a>. Here’s some stuff that happened at the workshop.</p>
http://www.cs.cornell.edu/~asampson/blog/apply.html6 Puppies That Will Make You Apply to Cornell University for a Ph.D. in Computer Science2015-11-25T00:00:00+00:00<p>If you’re curious about research in computer science, please <a href="https://www.cs.cornell.edu/phd/admissions#application">apply to the Ph.D. program at Cornell CS</a> or <a href="http://www.ece.cornell.edu/ece/academics/graduate/phd/admission.cfm">in ECE</a>.
Cornell has a top-notch, world-class research program.
It’s especially strong in the spectrum of computer systems topics: programming languages, compilers, operating systems, networks, security, and architecture.
I look forward to working with tenacious, creative new students who are curious about these topics when I start as an assistant professor there next fall.</p>
<p>Tell everyone you know to apply.
The deadline’s December 15 for both departments.
I’ll let these six puppies explain why Cornell is the perfect CS program for you.</p>
<h2 id="1-this-puppy-who-graduated-from-a-top-tier-department-with-incredible-researchers-in-every-area-of-computer-science">1. This puppy, who graduated from a top-tier department with incredible researchers in every area of computer science.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/1.jpg" alt="mortarboard puppy!" /></p>
<p>Cornell has one of the world’s most renowned computer-science research programs by any measure.
It’s in the top 10 of any traditional ranking out there, but it’s not just about <a href="http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-science-schools/computer-science-rankings"><em>US News</em></a>.
At any serious computer science conference, you’ll see faculty and students from Cornell leading their fields.
Check out the constant torrent of publicity and accolades on <a href="http://www.cs.cornell.edu/information/news">the CS department’s news feed</a>.</p>
<p>At a school like Cornell with broad excellence across different areas, you don’t have to know exactly what you want to do before you apply.
Whichever line of research you fall in love with, Cornell has incredible faculty in that area to help you change the world.</p>
<h2 id="2-this-adorable-row-of-puppies-who-symbolize-cornells-unfair-concentration-of-programming-languages-superstars">2. This adorable row of puppies, who symbolize Cornell’s unfair concentration of programming-languages superstars.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/2.jpg" alt="so many puppies!" /></p>
<p>If programming languages, software verification, type theory, software engineering, or compilers are your thing, Cornell is an especially good match.
Just look at <a href="https://www.cs.cornell.edu/research/lang">this star-studded list</a>.
Even <a href="http://www.cs.cornell.edu/~jgm/">our dean</a> is a PL giant.
It’s downright unfair for other CS departments.</p>
<h2 id="3-this-puppy-hanging-out-with-a-cat-in-the-same-way-cornell-cs-researchers-collaborate-with-amazing-computer-architects-in-the-ece-department">3. This puppy hanging out with a cat, in the same way Cornell CS researchers collaborate with amazing computer architects in the ECE department.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/3.gif" alt="a puppy meeting a cat!" /></p>
<p>The Cornell ECE department’s <a href="http://www.csl.cornell.edu">Computer Systems Laboratory</a> houses even more world-class computer-science researchers.
And they’re not slowing down: this year, they hired the excellent <a href="http://web.stanford.edu/~cdel/">Christina Delimitrou</a> from Stanford.
Through collaboration between the two departments, you can do cross-cutting research that spans the entire system stack <a href="http://www.cs.cornell.edu/andru/papers/asplos15/asplos15.pdf">from logic gates to type theory</a>.</p>
<h2 id="4-this-snuggly-puppy-who-reflects-a-departmental-culture-that-prioritizes-your-success-as-a-grad-student">4. This snuggly puppy, who reflects a departmental culture that prioritizes your success as a grad student.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/4.jpg" alt="cuddly puppy!" /></p>
<p>For your success and happiness as a grad student, the cultural norms in a department can be just as important as research excellence.
Cornell’s CS department has a well-deserved reputation for preserving an inclusive, approachable culture and for paying attention to its grad students’ lives and careers.
Doing great research is not worth being miserable for <em>n</em> years—and at Cornell, you don’t have to choose.</p>
<h2 id="5-this-outdoorsy-puppy-who-knows-that-ithaca-is-a-real-life-utopia">5. This outdoorsy puppy, who knows that Ithaca is a real-life utopia.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/5.jpg" alt="puppy in an Ithaca gorge!" /></p>
<p><a href="https://en.wikipedia.org/wiki/Ithaca,_New_York">Ithaca, New York</a> is a fantastic place to live.
The natural beauty is stunning, the progressive culture and politics are renowned, local food is abundant, and the traffic is nonexistent.
Admittedly questionable research outfits regularly rank bestow titles like <a href="http://venturebeat.com/2013/06/25/smartest-cities-in-america/">“the smartest city in America”</a>
and <a href="http://www.businessinsider.com/ithaca-is-the-best-college-town-in-america-2014-10?op=1">“the best college town in America”</a> on Ithaca.</p>
<p>Oh, and <a href="http://tech.cornell.edu">we’re in New York City</a> too.</p>
<h2 id="6-this-puppy-who-hangs-out-with-cows-just-like-you-can-on-cornells-campus">6. This puppy who hangs out with cows, just like you can on Cornell’s campus.</h2>
<p><img class="img-responsive" src="http://www.cs.cornell.edu/~asampson/media/puppies/6.jpg" alt="puppy hanging out with some cows!" /></p>
<p>There are <a href="https://ansci.cals.cornell.edu/about-us/facilities">cows</a>! Right on campus! You can hang out with them, <a href="https://www.quora.com/Why-is-Professor-Greg-Morrisett-so-fond-of-cows">just like Greg</a>!</p>
<p>No interest in computer science? Already have an advanced degree? These puppies don’t care! Be careful before you click; you’ll be preparing your application materials before you get to #4.</p>