Date: Thursday, September 18, 2025
Time: 11:45 a.m. to 12:45 p.m.
Location: G01 Gates Hall
Speaker: Richard Vuduc, Georgia Tech, School of Computational Science and Engineering
Abstract:
The answer is "yes, probably," but now that I (hopefully) have your attention: This talk reflects on this question using two studies conducted by former members of my lab, separated by more than a decade. The most recent study assumes that high-end supercomputer designs will be dominated by large language model training workloads (hardly a stretch of the imagination). It then asks what the fastest supercomputer architecture for such a workload might look like, considering O(100) trillion parameter models, a reasonably large codesign space of potential machine designs, and the myriad ways to implement the software that would run on top of them. But "speed" is only one concern, with technology and physical constraints, like power, among others. How would that change the answer? I'll speculate by returning to the older study, which—under amusingly strong simplifying assumptions—speculates on the kinds of general-purpose supercomputers it might be feasible to build instead.
The two studies are based on work by Mikhail Isaev (now at NVIDIA) and Kent Czechowski (OJO Labs), respectively. But as a disclaimer, this talk reflects only my interpretation of their work and not necessarily their views.
