Overview

Since 2023, Cornell CS has hosted an annual invited gathering of the world's strongest junior theoretical computer scientists.

[Register Here]

Speakers


Guy Blanc (Stanford)	Uma Girish (Princeton)
Max Hopkins (UCSD)	Rahul Ilango (MIT)
Bingbin Liu (CMU)	Peter Manohar (CMU)

Participation

Registration is free of charge. Please register here. All talks will be broadcast over Zoom.

Schedule

Thursday, May 9

Time	Event	Location
10am	Breakfast	Outside Gates G01
10:30am – 11:30am	Talk: Max Hopkins	Gates G01
11:30am	Break	Gates G01
11:45am – 12:45pm	Talk: Peter Manohar	Gates G01
1pm – 2pm	Group Lunch	Gates 122
2:15pm – 3:15pm	Talk: Uma Girish	Gates G01
3:30pm – 4:45pm	Social	Gates Plaza

Friday, May 10

Time	Event	Location
10am	Breakfast	Outside Gates G01
10:30am – 11:30am	Talk: Rahul Ilango	Gates G01
11:30am	Break	Gates G01
11:45am – 12:45pm	Talk: Bingbin Liu	Gates G01
1pm – 2pm	Lunch with students	Collegetown
2:15pm – 3:15pm	Talk: Guy Blanc	Gates G01
3:30pm – 5pm	One on ones w/faculty	Faculty Offices

Location

Talk Details

GUY BLANC

Robust data analysis and the magic of subsampling

Abstract: Classic models of data analysis assume the data is drawn independently from the distribution of interest, but the real world is rarely so kind. Therefore, we desire robust algorithms to deviations from these classic assumptions and have correspondingly developed many models for how an algorithm can be "robust." This leads to a practical challenge - with many distinct models of robustness, it's difficult for the practitioner to determine which is most appropriate for their data. We make headway towards this challenge in two ways. First, we show that one technique, that of subsampling, leads to robustness in two very distinct settings. This suggests that even if the analyst does not exactly know which model is most appropriate for their setting, they should try subsampling. Second, we show that many distinct models of robustness are equivalent. This alleviates the challenge of deciding which model of robustness is most appropriate by greatly reducing the number of truly unique models. Based on joint works with Jane Lange, Ali Malik, Li-Yang Tan, and Greg Valiant.

UMA GIRISH

Fourier growth of communication protocols for XOR functions

Abstract: Fourier growth of a Boolean function refers to the growth of the sum of absolute values of the level-$k$ Fourier coeﬀicients. Intuitively, functions with small Fourier growth cannot aggregate many weak signals in the input to obtain a considerable effect on the output. Upper bounds on the Fourier growth, even for the first few levels, have important applications in pseudorandomness, learning theory, circuit lower bounds and quantum-versus-classical separations. Tight bounds on the Fourier growth have been extensively studied for various computational models, including AC0 circuits, branching programs, decision trees and parity decision trees.

We study the Fourier growth of functions associated with communication protocols. In our setting, Alice gets a bitstring $x$ and Bob gets a bitstring $y$, and together they wish to compute a Boolean function that only depends on $x+y$ (the point-wise XOR of Alice's and Bob's inputs) while minimizing the amount of communication. Such communication problems are referred to as XOR functions and are central in the field of communication complexity. If a protocol $C$ computes an XOR function, then the output $C(x,y)$ of the protocol is a function of $x + y$. This motivates us to study the XOR-fiber $H$ of a communication protocol $C$ defined by $$ H(z) := \mathbb{E}[ C(x,y) \mid x + y = z] . $$ XOR-fibers of communication protocols form a powerful class of functions, which includes decision trees and parity decision trees. Proving tight bounds on the Fourier growth of XOR-fibers has applications to the Gap-Hamming problem and improved quantum versus classical separations in communication complexity.

In this talk, we present improved Fourier growth bounds for the XOR-fibers of randomized communication protocols that communicate at most $d$ bits. For the first level, we show a tight $O(\sqrt{d})$ bound. For the second level, we show an improved $O(d^{3/2})$ bound. We conjecture that the optimal bound is $O(d \cdot \rm{polylog}(n))$ and this is an open question.

Our proof relies on viewing the protocol and its Fourier spectrum as a martingale. One crucial ingredient we use to control the step sizes is a spectral notion of $k$-wise independence. We also provide a way of adaptively partitioning a large set into a few spectrally $k$-wise independent sets.

This is based on a joint work with Makrand Sinha, Avishay Tal and Kewen Wu.

MAX HOPKINS

Chernoff Bounds on HDX and their Applications

Abstract: Recent years have seen the emergence of high dimensional expanders (HDX) as a powerful tool in theoretical computer science, with breakthrough applications in approximate sampling and property testing. A central force behind the success of HDX in application is their concentration of measure. Consider the following basic question: given a $k$-uniform hypergraph $X$ and a subset $A$ of its vertices, how likely is a random hyperedge to "see" $A$ in approximately the right proportion? When $X$ is the complete hypergraph, this question was classically resolved by Chernoff and Hoeffding, who showed $X$ has a subgaussian tail. In 2017, Dinur and Kaufman introduced spectral HDX and proved they satisfy Chebyshev-type (inverse-polynomial) concentration not only for this basic setting, but critically for the more general case when $A$ sits on $i$-faces of the complex $X$. This led to a breakthrough line of work constructing sparse agreement testers among other applications, but left an exponential gap with known bounds for the complete hypergraph.

In this talk, we will revisit the basic notion of high dimensional expanders and prove they satisfy optimal concentration of measure, matching Chernoff-Hoeffding for the complete hypergraph. Leveraging this fact, we will further prove HDX are reverse hypercontractive, a powerful analytic inequality from boolean function analysis used in the construction of (combinatorial) PCPs with optimal soundness. In the second half of the talk, we overview several applications of concentration and reverse hypercontractivity on HDX, including new agreement tests in the list-decoding regime, the first explicit bounded-degree hypergraphs with optimal geometric overlap, and new degree lower bounds for HDX.

Based on joint work with Yotam Dikstein.

RAHUL ILANGO

Tales of Obfuscation: Spaghetti Code Meets Derandomization, Differential Privacy, Logic, and Metacomplexity

Abstract: A recent breakthrough in cryptography is the construction of "program obfuscators," which "scramble" code so that the behavior of an algorithm is unintelligible in some formal sense. In this talk, I will discuss this notion and overview three recent results that use obfuscation to answer open questions in various areas of theoretical computer science, including mathematical logic, derandomization, average-case complexity, and differential privacy. This talk will focus on giving a brief description of these results and connections, emphasizing intuition and ideas. No background in cryptography or complexity is required. This talk is based on joint works with Badih Ghazi, Yizhi Huang, Pritish Kamath, Ravi Kumar, Jiatu Li, Pasin Manurangsi, Hanlin Ren, and Ryan Williams.

BINGBIN LIU

Guiding machine learning design with insights from simple testbeds

Abstract: Machine learning systems are becoming increasingly powerful and complex, which poses challenges for performing diagnostics and making targeted improvements. In this talk, we will see how simple testbeds amenable to theoretical analyses can yield insights to guide practical advancements.

We will look at two language model-inspired setups. The first is masked prediction, inspired by the BERT model, where we investigate the impact of the masking rate. We study this through a parameter identifiability lens, assuming data from a hidden Markov model and revealing connection to the uniqueness of tensor rank decompositions. The second is algorithmic reasoning, which, despite being sequential in nature, has been successfully modeled by parallel architectures such as Transformers. By representing these reasoning tasks with finite-state automata and using tools from Krohn-Rhodes theory and formal languages, we show how Transformers can simulate reasoning using far fewer layers than the number of sequential reasoning steps, and discuss the gap between these theoretical constructions and practical considerations such as interpretability and generalization.

PETER MANOHAR

New Spectral Techniques in Algorithms and Coding Theory: the Kikuchi Matrix Method

Abstract: In this talk, I will present a recently developed method to solve combinatorial problems by (1) reducing them to establishing the unsatisfiability of systems of $k$-sparse linear equations mod 2 with a limited amount of randomness, and then (2) showing unsatisfiability of these equations by analyzing the spectral properties of appropriately chosen induced subgraphs of Cayley graphs on the hypercube (and related variants) called Kikuchi graphs.

We will discuss the following applications in this talk:

Designing an algorithm for refuting/solving semirandom and smoothed instances of constraint satisfaction problems (obtained by applying a small random perturbation to the literal patterns in a worst-case instance) that matches the best running-time vs constraint-density trade-offs for the special and significantly easier case of random CSPs;
Proving a cubic lower bound on the length of 3-query locally decodable codes, improving on the quadratic lower bound of Kerenidis and de Wolf (2004);
Proving an exponential lower bound on the length of linear 3-query locally correctable codes, improving on the cubic lower bound above;
Resolving Feige's conjecture from 2008, an extremal combinatorics conjecture about the girth vs. density trade-off for hypergraphs that generalizes the Moore bound for graphs.