Homepage - Yanbang Wang

Yanbang Wang

Ph.D. Candidate, Computer Science, Cornell University
IvyPlus Visiting Scholar, Stanford University
Research Intern, Google Research

I am a final-year Ph.D. candidate in Computer Science at Cornell University, extremely fortunate to be advised by Prof. Jon Kleinberg. I am currently visiting Stanford Computer Science as an IvyPlus Scholar for the 2025–26 academic year. I am also affiliated with Google Research as a Student Researcher.

I obtained my M.S. in Computer Science from Stanford University, where I worked with Prof. Jure Leskovec. I received my B.S. in Computer Science and Mathematics, graduating Summa Cum Laude (top 1%), from the Hong Kong University of Science and Technology.

My recent research centers on LLMs for code generation, post-training world-leading foundation models and building agentic harnesses that top industry-standard coding benchmarks. At Google, I created and first-authored Gemini-SQL2, currently the best LLM for SQL generation in the world. Post-trained from Gemini 3.1 Pro and served in a dedicated agentic harness, Gemini-SQL2 ranks #1 on the highly competitive, industry-standard BIRD benchmark, scoring 80.04 on the held-out test set in BIRD’s independent evaluation — a significant margin over all competitors, including submissions from OpenAI, Anthropic, and AWS.

More broadly, I study how to reason, retrieve, and generate over structure-rich text and context by explicitly exploiting the underlying structure. Representative directions include:

Coding, where programming languages carry rich syntactic structure (e.g. AST) and code repositories interlink files through intricate cross-references [Gemini-SQL2, two U.S. patents].
Database, where data agents navigate the complex inter-table relationships defined by primary–foreign-key constraints [Gemini-SQL2, Gemini-SQL, VLDB’22].
Recommendation systems, where modeling the rich structure of user–item interaction history is central to nearly every task [ICML’26, NeurIPS’23, ICLR’21].
Knowledge graph, where LLMs retrieve and reason over entities and the relations in the augmented knowledge context. [NeurIPS’24, IC2S2’24].

My work has drawn broad recognition and global attention: Gemini-SQL2’s launch announcement is the single most-liked post from Google Research on X over the past year. I also serve as a General Chair of the Learning on Graphs (LoG) Conference 2025 and 2026.

Curriculum Vitae

ywangdr(at)cs.cornell.edu Google Scholar Twitter LinkedIn

Education

Cornell University

Department of Computer Science

Ph.D. Candidate, advised by Jon Kleinberg

2021 - present

GPA 4.2/4.3
Stanford University

Department of Computer Science

M.S., advised by Jure Leskovec

2019 - 2021

GPA 4.1/4.3
Hong Kong University of Science and Technology

Computer Science & Mathematics

B.S., Summa Cum Laude (Top 1%)

2015 - 2019

GPA 4.0/4.3

Experience

Google Research

Research Intern

2025 - 2026
Meta AI

Research Scientist Intern

2024, 2025
Microsoft Research

Research Scientist Intern

2023
MIT CSAIL

Visiting Student Researcher

2018

Awards & Services

General Chair, Learning on Graphs (LoG) Conference

2025, 2026
Thinking Machine Research Grant

2026
Microsoft Accelerating Foundation Models Research Grant

2024
Stanford Graduate with Distinction in Research

2021
HKUST Academic Achievement Medal (top 1%)

2019

News

2026

Gemini-SQL2 — the text-to-SQL coding LLM I led at Google — ranks #1 on the BIRD Bench! ( Click the "Single-Model Leaderboard" tab to see the rankings.)

Jun 15

Gemini-SQL2 was featured by Google Research (also on X) and our VP's repost. 🎉

Jun 10

My first-authored paper with Meta AI on negative sampling was accepted to ICML 2026!

May 01

I am serving as a General Chair for the fifth Learning on Graphs Conference (LoG 2026).

Jan 01

2025

I have moved to the Bay Area.

Sep 01

We are organizing the third LoG-NYC Workshop on Apr 21–22.

Mar 01

Selected Publications (view all )

Gemini-SQL2: Model, Harness, and System Design

Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni

Preprint (under review) 2026

Gemini-SQL2 is currently the best coding LLM for text-to-SQL in the world. Gemini-SQL2 is Gemini 3.1 Pro post-trained and serves in a dedicated agentic harness. It currently ranks #1 on the BIRD leaderboard which is the de facto standard for text-to-SQL tasks.

[Google's Announcement] [VP's Repost] [BIRD Leaderboard]

Gemini-SQL2: Model, Harness, and System Design

Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni

Preprint (under review) 2026

[Google's Announcement] [VP's Repost] [BIRD Leaderboard]

Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2025

The first systematic study of how LLMs memorizes structural information in text. We find LLMs often underperform and are biased towards certain error patterns, and that stronger models memorizes better when the structures are narrated in a domain-consistent style.

[Paper]

Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2025

[Paper]

Negative Sampling From the Ground Up

Yanbang Wang, Jon Kleinberg, Yanhong Wu

International Conference on Machine Learning (ICML) 2026

We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.

[Paper]

Negative Sampling From the Ground Up

Yanbang Wang, Jon Kleinberg, Yanhong Wu

International Conference on Machine Learning (ICML) 2026

We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.

[Paper]

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li

International Conference on Learning Representations (ICLR) 2021

Causal Anonymous Walks (CAWs) automatically retrieve temporal network motifs to represent network dynamics and use an anonymization strategy that keeps the method inductive, achieving SOTA on transductive and inductive temporal link prediction.

[Paper] [Project] [Code]

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li

International Conference on Learning Representations (ICLR) 2021

[Paper] [Project] [Code]

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2023

One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.

[Paper]

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2023

One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.

[Paper]

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

Yanbang Wang, Jon Kleinberg

International Conference on Learning Representations (ICLR) 2024

We study the consequences of representing higher-order systems as graphs rather than hypergraphs, characterizing the information lost in hypergraph projection and proposing a learning-based method to reconstruct the original higher-order relations.

[Paper]

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

Yanbang Wang, Jon Kleinberg

International Conference on Learning Representations (ICLR) 2024

[Paper]

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec

The Web Conference (WebConf) 2021

TEDIC learns representations on dynamic social interaction networks by diffusing node attributes over a network and its complement and applying temporal convolutions, outperforming prior methods across four social-character prediction tasks.

[Paper] [Project] [Talk]

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec

The Web Conference (WebConf) 2021

[Paper] [Project] [Talk]

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Neural Information Processing Systems (NeurIPS) 2020

Distance Encoding (DE) is a general class of structure-related features that provably gives GNNs more expressive power than the 1-Weisfeiler–Lehman test, distinguishing node sets in almost all regular graphs where traditional GNNs fail.

[Paper] [Project] [Code]

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Neural Information Processing Systems (NeurIPS) 2020

[Paper] [Project] [Code]

Education

Experience

Awards & Services

News

Selected Publications (view all )

Gemini-SQL2: Model, Harness, and System Design

Gemini-SQL2: Model, Harness, and System Design

Microstructures and Accuracy of Graph Recall by Large Language Models

Microstructures and Accuracy of Graph Recall by Large Language Models

Negative Sampling From the Ground Up

Negative Sampling From the Ground Up

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

All publications