CS Junior Theorists' Workshop

This is the third iteration of an annual tradition at Cornell. We have invited six of the best junior (i.e., PhD or postdoc) theorists to visit Cornell to each give a talk and to meet with Cornell faculty and students.

Hosted by Eshan Chattopadhyay and Noah Stephens-Davidowitz

Click here to attend via Zoom
Location: Gates Hall G01

Schedule

Thursday, May 7
10:00 a.m. - Breakfast
10:30-11:30 a.m. - Miranda Christ, Ph.D. student, Columbia University
11:30-11:45 a.m. - Break
11:45 a.m. -12:45 p.m - Charlotte Peale, Ph.D. student in Computer Science Theory, Stanford University
1-2 p.m. - Lunch break
2:15-3:15 p.m. - Barak Nehoran, Postdoctoral Scholar in Computer Science, Columbia University
3:30-5 p.m. - Socializing/meetings

Friday, May 8
10:00 a.m. - Breakfast
10:30-11:30 a.m. - Prasanna Ramakrishnan, Ph.D. student, Stanford University
11:30-11:45 a.m. - Break
11:45-12:45 p.m. - Jane Lange, Ph.D. student, MIT CSAIL
1-2 p.m. - Lunch break
2:15-3:15 p.m. - Zihan Zhang, Ph.D. candidate in computer science, The Ohio State University
3:30-5 p.m. - Socializing/meetings

Speakers

Miranda Christ

A color photo of a woman smiling for a portrait.

Bio: Miranda Christ is a final-year computer science P.hD. student at Columbia University, where she is a member of the Theory Group and the Crypto Lab. Co-advised by Tal Malkin and Mihalis Yannakakis. She is interested in practically motivated theoretical problems in cryptography.

Title: Pseudorandom Error-Correcting Codes with Applications to Watermarking Generative AI

Abstract: Motivated by the growing need to identify AI-generated content, we ([CG24]) introduced a powerful new framework for generative AI watermarking. This framework leverages a new cryptographic primitive called a pseudorandom error-correcting code (PRC).

A PRC is an error-correcting code with the property that any polynomial number of codewords are pseudorandom to any efficient adversary.
Since the introduction of PRCs, there has been a flurry of exciting works strengthening their properties and implementing them in practice. I will highlight two newer works with my collaborators ([AAC+25, CGG+26]) in which we define and construct improved PRCs motivated by applications. In the first, we construct a notion of an ideal PRC, where robustness holds even against an adversary that can query encoding and decoding oracles. In the second, we present improved constructions from an assumption used previously to construct doubly-efficient PIR. This construction is also the first with plausible subexponential security.

This is based on works with Sam Gunn, Omar Alrabiah, Prabhanjan Ananth, Yevgeniy Dodis, Noah Golowich, Ankur Moitra, and Daniel Wichs: [CG24] https://eprint.iacr.org/2024/235.pdf, [AAC+25] https://eprint.iacr.org/2024/1840, [CGG+26] https://eprint.iacr.org/2025/2222

Charlotte Peale

A color photo of a woman smiling for a photo outdoors.

Bio: Charlotte Peale is a fifth year Ph.D. student in Computer Science Theory at Stanford University, advised by Omer Reingold. Her research is supported by the 2024 Apple Scholars in AIML Fellowship, and she spent the summers of 2024 and 2025 interning at Apple under the mentorship of Parikshit Gopalan, Aravind Gollakota, Udi Wieder, and Moises Goldszmidt.

Title: Uncertainty Quantification Beyond Calibration

Abstract: Calibration has emerged as the standard formal approach to uncertainty quantification, but the precise guarantees it provides are not always clear. This talk explores two recent works that establish more rigorous connections between calibration and the concrete goals of uncertainty quantification in modern machine learning.

First, we establish a tight connection between calibration and error diagnosis, the ability of a model to accurately assess its own limitations. We characterize the precise conditions under which a model's uncertainty estimates can compete with externally-trained error predictors, revealing when self-assessment is and isn't possible.

Second, we address uncertainty decomposition into epistemic (model-based) and aleatoric (data-based) components. This distinction is vital for understanding prediction errors in complex, subjective tasks, and for determining whether collecting more data will actually improve performance. However, standard calibration is insufficient to provide rigorous decomposed uncertainty estimates. We introduce higher-order calibration, a strengthened notion that provides theoretical guarantees for uncertainty decomposition.

Barak Nehoran

A color photo of a man speaking at a podium.

Bio: Barak Nehoran is a postdoctoral research scientist in theoretical computer science at Columbia University, where he is hosted by Professor Henry Yuen. He completed my Ph.D. at Princeton University where he was fortunate to be co-advised by Professors Ran Raz and Mark Zhandry, and where he also previously worked in the lab of Sebastian Seung.

His research lies in the areas of quantum information, cryptography, and complexity theory, and more generally at the intersections of physics, information, and computation.

Title: Quantum Lazy Sampling for Any Group: Compressed Oracles, Perfect Simulation, and New Pseudorandom Constructions

Abstract: Alice wants to make black-box queries to an oracle O, sampled from a group G (for instance, a random function, permutation, orthogonal matrix, or unitary). However, Bob does not have the exponential resources required to sample the full truth table of O. He instead samples every entry on-the-fly for each of Alice’s queries, recording his responses along the way to ensure consistency. Such a lazy sampling technique allows Bob to inspect his recording (the partial truth table) to learn what Alice knows about the oracle, and what she does not, allowing us to bound Alice’s success at solving an associated problem.

For quantum queries to classical functions, such “recording oracle” techniques were introduced by Zhandry (2019). Extensions to certain other groups have been recently discovered (specifically, for random unitaries [Ma Huang 25], and random permutations [Rosmanis 21, MMW 25, ABCGY 25, Carolan 25]). But these extensions have been ad hoc guesses for each group, difficult to interpret, and/or incur inherent error which limits the number of queries that can be simulated.

We give a generic framework for constructing perfect recording oracles of any group representation, and show how to inspect and interpret these oracles for lower bounds and cryptographic security proofs. In particular, we show that every group representation can be lazy-sampled in two distinct but equivalent ways: either by recording a pair of Bratteli paths (the basis of the irreducible representations of the group), or by recording an appropriately symmetrized Feynman path.

As an application, we give a generic prescription for constructing and proving the security of pseudorandom unitaries (and pseudorandom group elements of any group). For instance, it allows us to directly compare the recording oracles for random unitaries and for random permutations. As a consequence, we prove the security of a new construction of pseudorandom unitaries, the simplest that we are aware of: a random Clifford followed by a random permutation.

Join work with Ben Foxman, Alex Lombardi, Fermi Ma, and John Wright

Prasanna Ramakrishnan

A color photo of a man sitting outdoors, smiling for a photo.

Bio: Prasanna Ramakrishnan is a Ph.D. student at Stanford University in the CS Theory group. He is co-advised by Moses Charikar and Li-Yang Tan.

Title: Breaking the Metric Voting Distortion Barrier

Abstract: We consider the following well-studied problem of metric distortion in social choice. Suppose we have an election with voters and candidates who lie in a shared metric space. We would like to design a voting rule that chooses a candidate whose average distance to the voters is small. However, instead of having direct access to the distances in the metric space, each voter gives us a ranked list of the candidates in order of distance. Can we design a voting rule that, regardless of the election instance and underlying metric space, chooses a candidate whose cost differs from the true optimum by only a small factor (known as the distortion)?

A long line of work culminated in finding deterministic voting rules with metric distortion 3, which is the best possible for deterministic rules and many other classes of voting rules. However, without any restrictions, there is still a significant gap in our understanding. Even though the best lower bound is substantially lower at 2.112, finding a randomized rule that guarantees distortion $3 - \epsilon$ for some constant $\epsilon$ has been a major challenge in computational social choice.

In this work, we give a rule that guarantees distortion less than 2.753 by carefully randomizing between Maximal Lotteries, a natural game-theoretical voting rule dating back to the 60's, and new voting rules that we introduce.

Jane Lange

A color photo of a woman standing by a large body of water.

Bio: Jane Lange is a sixth-year Ph.D. student at MIT CSAIL, where she studies theoretical computer science under Ronitt Rubinfeld. Broadly, she is interested in algorithms related to machine learning and property testing. More specifically, Jane is interested in the application of techniques from the fields of sublinear algorithms and analysis of Boolean functions for the purpose of designing efficient algorithms for learning, property testing, and other similar problems.

Title: Limitations of membership queries in testable learning

Abstract: Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the testable learning model of Rubinfeld and Vasilyan, membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal.
We give a general reduction from sample-based refutation of boolean concept classes, as presented in Vadhan ’17 and Kothari, Livni ’18, to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in . The result is that, relative to a concept class and a distribution family, no m-sample TL-Q algorithm can be super-polynomially more time-efficient than the best m-sample PAC learner.

Finally, we define a class of “statistical” MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.

Zihan Zhang

Bio: Zihan is a Ph.D. candidate in computer science at The Ohio State University under the guidance of Prof. Zeyu Guo. His current research interests are in theoretical computer science with a focus on coding theory, combinatorics, pseudorandomness, and applications of coding theory in cryptography.

Title: Recent Advances in Optimal List Decoding of Reed–Solomon and Related Codes

Abstract: Reed–Solomon (RS) codes are one of the most important families of error-correcting codes, with applications spanning communication, storage, cryptography, and theoretical computer science. A central challenge is list decoding: recovering all codewords that agree with a received word even when the number of errors exceeds the unique-decoding limit. Since the classical Guruswami–Sudan algorithm (FOCS ’98), we have known how to efficiently list decode RS codes up to the Johnson bound, but whether RS-type codes can be list decoded beyond this barrier—approaching the information-theoretic capacity—has remained a major open problem for decades.

In this talk, I will highlight recent advances addressing this question from two complementary directions. First, I will discuss breakthroughs showing that random RS codes achieve near-capacity list decoding with essentially optimal list sizes. Second, I will describe progress on explicit RS-type codes called folded Reed–Solomon codes, including recent results that achieve optimal list size while retaining efficient decoding, resolving a nearly two-decade-old question posed by Guruswami and Rudra (STOC ’06).

Overall, these developments show that both random and explicit RS-related codes can attain near-capacity list decoding, and they introduce new algebraic and combinatorial techniques with broader connections to pseudorandomness, cryptography, and related areas.