July 15, 2024

Automated Test-Case Reduction

In a previous post, I used a simple interpreter bug to demonstrate the research skill of manually reducing test cases. This time, I show off the excellent Shrinkray reducer to see how it can automate the same process. The tricky part when using automated test-case reducers is writing an interestingness test that actually does what you want. I list a few tricks that help me write good interestingness tests.

May 28, 2024

One Weird Trick for Efficient Pangenomic Variation Graphs (and File Formats for Free)

Last time, I introduced pangenomic variation graphs, the standard text file format that biologists use for them, and a hopelessly naïve reference data model we implemented for them. This time, we use a single principle—flattening–to build an efficient representation that is not only way faster than the naïve library but also competitive with an exisitng, optimized toolkit. Flattening also yields a memory-mapped file format “for free” that, in a shamelessly cherry-picked scenario, is more than a thousand times faster than the serialization-based alternative.

April 30, 2024

Pangenomic Variation Graphs, and a Reference Data Model

Here’s an overview of Graphical Fragment Assembly (GFA), the standard text file format for representing pangenomic variation graphs. We wrote some Python libraries to represent and analzye GFA files in the most straightforward, readable way possible.

December 21, 2023

Manual Test-Case Reduction

I often find myself recommending to new researchers that they try reducing a buggy test case to understand a problem better. To better explain what I mean by that, I recorded a little video of myself reducing a test for a bug in a Bril interpreter.

December 6, 2023

Critiquing a PhD Application Statement

I offer some feedback on a thoroughly mid statement of purpose for PhD applications from fifteen years ago.

May 1, 2023

Flattening ASTs (and Other Compiler Data Structures)

This is an introduction to data structure flattening, a special case of arena allocation that is a good fit for programming language implementations. We build a simple interpreter twice, the normal way and the flat way, and show that some fairly mechanical code changes can give you a 2.4× speedup.

March 26, 2023

Very Large Scale Disintegration

My WACI talk at ASPLOS 2023 is about wishful thinking: what if we could recycle the chiplets from old, obsolete multi-chip modules, remix them, and package them into new silicon products?

July 22, 2022

Try Snapshot Testing for Compilers and Compiler-Like Things

Snapshot testing is a preposterously simple method for testing tools that transform text to other text, like compilers. This post is an example-based overview of snapshot testing using a tool we developed in our lab called Turnt. I also extoll the subversive philosphy that the technique embodies, which prioritizes the velocity of adding new regression tests over traditional goals like precision and specificity.

June 29, 2021

From Hardware Description Languages to Accelerator Design Languages

An emerging class of programming languages aims to make it easier to design application-specific hardware accelerators. Relative to mainstream hardware description languages (HDLs), these new languages sacrifice the ability to express arbitrary circuitry in exchange for a higher level of abstraction specifically for accelerators. This post defines this new category of accelerator design languages (ADLs) and calls for more research on their design.

February 22, 2021

Minimum-Effort Class Recording Setup

Here are a few notes and links about the camera, microphone, stand thingies, software, and an editing strategy I used to make the videos for CS 6120 in the fall.

November 25, 2020

Make Your Grad School Application Sparkle with This One Weird Trick

When PhD application season rolls around, much of what goes into your application is already pre-determined. I have one piece of advice you can still apply as deadlines loom to improve your materials.

April 25, 2020

Uncrudify the New ACM Digital Library

Here’s a bookmarklet that removes the crud from the new ACM Digital Library pages.

June 22, 2019

FPGAs Have the Wrong Abstraction

Verilog is the de facto abstraction for programming today’s FPGAs. RTL programming is fine if you want to use FPGAs for their traditional purpose as circuit emulators, but it’s not the right thing for the new view of FPGAs as general-purpose computational accelerators.

June 2, 2018

Please Help Me Solve an Old Approximate Computing Problem

Here’s a problem from our paper on approximate storage that has been bugging me for about five years now. I think it’s a coding theory problem, but I have no traction whatsoever. Send me your brilliant insights.

May 9, 2018

Program Synthesis is Possible

Inspired by Aws Albarghouthi’s primer, I’ll give a little lecture on program synthesis for the last day of this year’s CS 6110. Here’s a code-driven introduction whose goal is to convince you that you too can synthesize programs.

February 27, 2018

Is JavaScript Statically or Dynamically Scoped?

Cornell’s CS 6110 gives a pretty solid definition of static and dynamic scoping for the λ-calculus, but I also wanted to give an example of static scoping in a real language. I wrestle with JavaScript, which has a little bit of both.

January 23, 2018

The MICRO Diversity Survey

The MICRO steering committee recently published a survey on diversity at the conference. Here are some responses.

January 16, 2018


Spectre is a shock, and the architectural implications seem unbounded. The weirdest part is that it’s not clear what the next generation of CPUs should do in response. Here are a few possibilities, but there are no easy answers.

December 4, 2017

FODLAM, a Poorly Named Tool for Estimating the Power and Performance of Deep Learning Accelerators

For a recent project, my group couldn’t find reusable, open-source tools for understanding the hardware costs of deep neural network accelerators. We’ve published a simple first-order model for the latency and energy of CNN execution.

October 14, 2017

Closed Problems in Approximate Computing

I’m giving a talk at the NOPE workshop about impossible problems in approximate computing. Here are some research directions that are no longer worth pursuing—and a few that are.

November 23, 2016

Statistical Mistakes and How to Avoid Them

You can get CS papers published with shoddy statistics, but that doesn’t mean you should. Here are three easy ways to bungle the data analysis in your evaluation section: don’t even try to use statistics when you really ought to; misinterpret an inconclusive statistical test as concluding a negative; or run too many tests without considering that some of them might be lying to you. I’ve seen all three of these mistakes in multiple published papers—don’t let this be you!

June 15, 2016

Probably Correct

Say you have a program that’s right only some of the time. How can you tell whether it’s correct enough? Using with some Wikipedia-level statistics, it’s pretty easy to make probabilistic statements about quality. I’ll explain two strategies for measuring statistical correctness. Then I’ll argue that it’s deceptively difficult produce guarantees that are any stronger than the ones you get from the basic techniques.

May 2, 2016

Weep for Graphics Programming

The mainstream real-time graphics APIs, OpenGL and Direct3D, make miserable standard bearers for the age of hardware heterogeneity. Their approach to heterogeneous programming leads to stringly typed interfaces, a huge volume of boilerplate, and impoverished GPU-specific programming languages.

April 17, 2016

Notes from WAX 2016

WAX is the workshop on approximate computing. This year at ASPLOS, I organized its third or fourth iteration, depending on how you count, along with Luis Ceze, Hadi Esmaeilzadeh, and Ben Zorn. Here’s some stuff that happened at the workshop.

November 25, 2015

6 Puppies That Will Make You Apply to Cornell University for a Ph.D. in Computer Science

No interest in computer science? Already have an advanced degree? These puppies don’t care! Be careful before you click; you’ll be preparing your application materials before you get to #4.

September 26, 2015

Function Inheritance is Fun and Easy

I’ve been using function inheritance to avoid writing boring boilerplate in a compiler project. Here, I demonstrate the technique with some examples in TypeScript.

September 8, 2015

The Apple ISA

Apple is on track to diverge from ARM and x86 to design its own proprietary instruction set. This good for the future of hardware–software co-design.

August 3, 2015

LLVM for Grad Students

LLVM is a godsend of a research tool. Here are some detailed notes on what LLVM is, why you would want to use it for research, and how to get started as a compiler hacker.

January 25, 2015

My Top Picks, 2014

IEEE Micro Top Picks is an annual special issue that collects the best of each year’s computer architecture conferences. This year, the chairs experimented with a community input process, which meant that even a lowly grad student could read the submissions and contribute comments. Here are my favorite papers from the year.

January 12, 2015

Swift/Cocoa Type Dissonance

I did some iOS programming recently (for an unknown reason). Using the new Swift language has made it evident the language is young—and, like a rebellious teenager, it conflicts with its much older framework counterpart, Cocoa. Here are two places where the disconnect is most stark, and where Swift should grow more sophisticated type-system features.

December 2, 2014

Bootstrapping Credibility, or: My Secret Evil Superplan for World Domination

I’m on a continuing quixotic quest to improve computer science publishing. Here’s my top-secret plan for launching a new publication venue that necessarily starts from zero credibility. It’s a two-pronged evil plan: first, incentivize good reviews, and second, address an underserved market.

November 25, 2014

An FPGA is an Impoverished Accelerator

Architects tend to conflate FPGAs too closely with ASIC acceleration. This is a mistake: when viewed as an acceleration substrate, FPGAs are a unfortunate accident of history with an exceptionally bad programming model. We should pay more attention to better alternatives.

November 6, 2014

Quala: Type Annotations in LLVM IR

My C and C++ type annotation project, Quala, aims to enable type-aware compiler research. This post demonstrates a type system that can insert dynamic null-pointer dereference checks to stop segfaults before they happen.

October 19, 2014

A Grading Rubric for Bug Reports

The world has no shortage of frustratingly unhelpful bug reports, but it can be hard to explain what makes them bad—and how they can improve. Try scoring your next report on this handy five-point scale.

October 7, 2014

Time for New Journals

Computer science conferences have shortcomings that many of us in the community are motivated to solve. One alternative to reforming conferences incrementally is to start alternative venues. New, lightweight, open-access journals could provide a proving ground for publishing-model ideas. It’s a risky prospect, but it’s a risk worth taking.

September 27, 2014

Hooknook: Like GitHub Pages for Your Server

GitHub Pages is a nifty feature that lets you automatically update a Web site with every git push. If you want the same functionality for sites you host on department servers, though, GitHub’s thing won’t work. Hooknook is my solution.

August 17, 2014

Quala: Custom Type Systems for Clang

Novel type systems can help catch lots of bugs and realize other wonders of programming magic. But they can be a pain to add to an industrial-strength compiler. Quala is a new open-source project to add pluggable type checkers to Clang in the hopes of making custom type systems as easy to opt into as LLVM passes.

June 23, 2014

Probabilistic Assertions

At PLDI this year, I gave a talk about probabilistic assertions, a new language construct for expressing correctness properties in programs that can behave randomly. We built an accurate and efficient verifier for passerts based on a new Bayesian-network representation for probabilistic programs.

June 7, 2014

How to Make a Conference Talk

I’m on my way to PLDI to give a talk about probabilistic assertions, so I have talk-building on my mind. This is a guide to preparing talks based on what I’ve learned so far in grad school.

June 1, 2014

Put It in Hardware

Why does baking something into the hardware make it faster? It may seem obvious, but I think there are four distinct reasons for implementing something in hardware. It’s cruicial to remember that they are separate advantages: architectures can “win” in some categories without addressing them all.

May 25, 2014

Embedding Fonts in PDFs

Conference and journal publishers want PDFs with fonts embedded. The Web is full of bad, incomplete, and broken advice about how to embed fonts. Here’s a tiny script that does it right.

May 11, 2014

Working With Undergrads as a Grad Student

Working with undergraduate researchers can be totally awesome. Here’s some advice on making undergrad research work for everyone.

April 27, 2014

Git: How to Do Things

If you use git, you might occasionally need to do things. This post explains how to do things with git.

April 20, 2014

Developing a Research Tool in the Open

As an experiment in open-sourcing research code earlier and often-er than I usually do, I’m developing a new compiler tool in the open. It’s a system for defining and checking user-specified type systems in C and C++ via the Clang compiler.

April 14, 2014

What Is Quality, Anyway?

I made a video of a short talk I gave at a workshop recently. The talk is a call to research action: the approximate-computing community should be thinking more carefully about the problem of specifying quality—even when programmers don’t know what quality means.

March 7, 2014

Notes from WACAS

I helped run WACAS 2014, a workshop on approximate computing, this week. Here are a few notes on broad topics that came up at the workshop.

February 23, 2014


I updated this site’s design so that it sucks a little bit less. Within: some ideas and techniques you can steal.

February 21, 2014

Use Jekyll for Your Academic Site

As a grad student, it’s a great idea to build a personal academic Web site. There are many tools that can help you build a good one. I’ve tried many of them and I think you should use Jekyll.

February 9, 2014

Holy-Grail Approximation

When reading and writing papers on approximate computing, I often find myself comparing proposed systems to an imaginary ideal. Even though that ideal is clearly unrealizable, it is helpful to conceptualize it and measure the ways that real systems fall short in comparison.

February 2, 2014

Setting Expectations for Academic Code Releases

There are lots of reasons to release code as an academic. Be careful to think about which ones apply to your code release—and communicate them with the world.

January 26, 2014

Naive Monitoring for Approximate Computing

Some papers on approximate computing systems propose simple systems for checking computations’ error rates. These systems can be naive and ineffective at solving the problems they were meant to address.

January 19, 2014

Viva Las Vega

Vega is a new plotting system that I’m totally in love with. There are a few challenges involved in abusing it as a system for producing publication-quality figures. I describe some of the problems and some workarounds for them.

January 12, 2014

Deadlines and Caffeine

In which I totally freak myself out by counting all the coffee I drank during my deadline-laden autumn quarter.

December 30, 2013

Approximate Storage

I gave a talk at MICRO earlier this month on the next step in our work on approximate computing. And in an effort to attain immortal YouTube fame, I recorded a conference talk video, which may be the most boring 20 minutes on the Internet.

April 20, 2013

Run an LLVM Pass Automatically with Clang

Lots of research projects need to instrument code while it gets compiled. While LLVM passes are a convenient way to implement instrumentation, the official LLVM documentation doesn’t make it clear how to use them that way easily. Here’s a trick that lets you instrument programs when compiling them with the Clang command-line compiler driver.

February 10, 2013

Safe and Efficient Shared Memory in Dynamic Languages

Some notes on one possible direction for addressing efficiency and correctness issues with parallelism in interpreters for dynamic (scripting) languages. I propose that the problem be addressed at the language level via a dynamic enforcement of locking discipline. This approach has the ancillary benefit of discouraging programmers from ignoring potential concurrency bugs.

January 24, 2013

Easy, Slow Dynamic Analysis With the LLVM Interpreter

Do you ever want to analyze a dynamic execution of a program compiled to LLVM bitcode but want to skip the headaches involved in bitcode instrumenation? Yeah, me too. In cases where the performance doesn’t matter, using an interpreter can be a good option for examining an execution. I built a small, hacky library you can use to quickly prototype your analyses using LLVM’s built-in interpreter.

January 5, 2013

Neural Acceleration

I worked on a project recently that showed that neural networks can act as accelerators for “soft” programs—even when they have nothing to do with learning, classification, or other tasks that make up the traditional domain of neural networks. We found that a simple accelerator based on a hardware neural network implementation can make some programs run much more efficiently.

October 25, 2012

Automatically Discovering Browser Performance Pitfalls

Web browsers on mobile devices are slow and use too much energy. WebChar is a system for automatically generating hypotheses about what exactly makes some pages slower and more energy-intensive than others. WebChar uses real-world measurements together with machine learning to create a model of browser resource consumption and mines it to generate new hypotheses. We found some surprising pitfalls in an analysis of browsers on two mobile systems.

October 25, 2012

PyPy and CPython’s Broken Multithreaded Semantics

The team behind PyPy, an astoundingly successful Python JIT, has been working on improving the performance of multithreaded Python programs. The tactic so far has been to adhere closely to the semantics of CPython’s global interpreter lock, or GIL, while mitigating its performance impact. But GIL semantics are expensive to enforce and encourge Python programmers to write subtly incorrect parallel programs. High-performance Python implementations should abandon not just the GIL but also its model for parallel programming.

September 9, 2012

Simple Cluster Computing in Python

I’ve written a simple Python library for running massively parallel tasks on compute clusters. Cluster-Workers makes it simple to get up and running with the kinds of parallelism that academics usually need when running large-scale batches of experiments. See if it fits your cluster-y use case as well as it does mine.

August 5, 2012

Time to Worry

Some recent press points out how important energy efficiency has been to making computers ubiquitous over the last few decades. But they miss the ominous signs that this trend is in the process of falling off a cliff.

April 24, 2012

What Is Macroscalar?

A couple months ago, a story made the nerd-press rounds about Apple’s trademark application and several patents for something called a “macroscalar” processor architecture. I’ve taken a stab at decoding the publicly available information about macroscalar architectures to give a coherent picture of the idea.

April 17, 2012

Green Clouds

Janet Tu writes in the Seattle Times today about a new Greenpeace report analyzing the use of renewable energy in data centers (or, as they put it, “the cloud”). I provided a couple of comments for the Times story and I’ll expand a little bit here on the importance and feasibility of improving energy efficiency in a cloud-centric world.

March 19, 2012

On the Radio

I appeared briefly on Weekday, a local public radio program, to talk about the Facebook fellowship and energy-efficient computing. And I met Jack Hitt!

February 12, 2012

Truffle, an Architecture for Approximate Computing

I recently worked on a project, called Truffle, that lends some credibility to the architecture assumed by EnerJ, the language for approximate computing that I worked on previously. The paper about Truffle was recently accepted to ASPLOS! Woohoo! I will give a talk about the project there in March.

December 9, 2011

Measuring Smartphone Energy on a Budget

For a recent research project, I measured the power consumption of a smartphone. I am clueless when it comes to electronics and I didn’t want to drop a lot of (my advisor’s) cash, so I needed a simple, relatively cheap setup to get reasonable power measurements. This post describes how you can get a similar apparatus up and running with a custom Python library I wrote for controlling a DC power supply.

July 7, 2011

Bluelet: Using Native Python Coroutines as Green Threads

As a side project, I wrote a simple implementation of green threads for the Python programming language. The library is called Bluelet and it uses Python’s native implementation of coroutines. Bluelet makes it easy to write concurrent socket programs without OS threads, multiple processes, or select()-and-dispatch loops.

June 11, 2011

EnerJ Aftermath: Presentation, Poster, and Press

FCRC was fantastic; I’m still reeling from all the great people I met. I’m posting my slides for the PLDI presentation and a poster for the project. I also learned a little bit about research reporting in the media.

June 8, 2011

A Detailed Quality-of-Service Profiler

For my CSE 503 class project, I implemented a detailed quality-of-service profiler, a tool to help identify code that is inessential for output correctness. The tool is an extension to work at MIT that originally proposed a quality-of-service profiler; this project is a slightly different take on the same basic idea.

March 25, 2011

Conference Spam Gmail Filter

If you’re like me, you’re probably not very interested in the latest deadline extension for the First Annual Multiconference on Informatics and Cybernetics. Here’s a filter for Gmail that catches most of this kind of conference spam.

February 5, 2011

EnerJ: A Language for Unreliable, Low-Power Systems

My paper on EnerJ, a language extension for approximate computing, will be presented at PLDI 2011! This is especially exciting because I hope EnerJ will be the basis for a bunch of new research directions in the near future. This is a summary of EnerJ’s type-based approach to bringing safety to unsafe, unreliable hardware.

January 22, 2011

Some Literature on Application-Level Error Exposure

My recent research has focused on power efficiency gains available by compromising on strict correctness guarantees. That is, many applications (like audio or video processors) can tolerate occasional errors in their execution—and permitting some errors can yield large gains in power or performance. Many researchers have come to a similar realization independently, so here I’ll try to collect together a few different approaches to the issue.

January 21, 2011

Hold On! It's a Blog!

I started an academic blog! Come back often.