2026

Gemini-SQL2: Model, Harness, and System Design

Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni

Preprint (under review) 2026

Gemini-SQL2 is currently the best coding LLM for text-to-SQL in the world. Gemini-SQL2 is Gemini 3.1 Pro post-trained and serves in a dedicated agentic harness. It currently ranks #1 on the BIRD leaderboard which is the de facto standard for text-to-SQL tasks.

Gemini-SQL2: Model, Harness, and System Design

Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni

Preprint (under review) 2026

Gemini-SQL2 is currently the best coding LLM for text-to-SQL in the world. Gemini-SQL2 is Gemini 3.1 Pro post-trained and serves in a dedicated agentic harness. It currently ranks #1 on the BIRD leaderboard which is the de facto standard for text-to-SQL tasks.

Negative Sampling From the Ground Up

Yanbang Wang, Jon Kleinberg, Yanhong Wu

International Conference on Machine Learning (ICML) 2026

We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.

Negative Sampling From the Ground Up

Yanbang Wang, Jon Kleinberg, Yanhong Wu

International Conference on Machine Learning (ICML) 2026

We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.

Graph-Language Models as Text-to-SQL Verifier

Yanbang Wang, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi

U.S. Patent 2026

Graph-Language Models as Text-to-SQL Verifier

Yanbang Wang, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi

U.S. Patent 2026

2025

Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2025

The first systematic study of how LLMs memorizes structural information in text. We find LLMs often underperform and are biased towards certain error patterns, and that stronger models memorizes better when the structures are narrated in a domain-consistent style.

Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2025

The first systematic study of how LLMs memorizes structural information in text. We find LLMs often underperform and are biased towards certain error patterns, and that stronger models memorizes better when the structures are narrated in a domain-consistent style.

Network Authentication Evaluation

Yanbang Wang, Karl Hallgren, Jonathan Larson

U.S. Patent (US20250337760A1) 2025

Network Authentication Evaluation

Yanbang Wang, Karl Hallgren, Jonathan Larson

U.S. Patent (US20250337760A1) 2025

2024

Network Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

International Conference on Computational Social Science (IC2S2), Oral 2024

Network Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

International Conference on Computational Social Science (IC2S2), Oral 2024

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

International Conference on Computational Social Science (IC2S2) 2024

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

International Conference on Computational Social Science (IC2S2) 2024

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

Yanbang Wang, Jon Kleinberg

International Conference on Learning Representations (ICLR) 2024

We study the consequences of representing higher-order systems as graphs rather than hypergraphs, characterizing the information lost in hypergraph projection and proposing a learning-based method to reconstruct the original higher-order relations.

From Graphs to Hypergraphs: Hypergraph Projection and its Reconstruction

Yanbang Wang, Jon Kleinberg

International Conference on Learning Representations (ICLR) 2024

We study the consequences of representing higher-order systems as graphs rather than hypergraphs, characterizing the information lost in hypergraph projection and proposing a learning-based method to reconstruct the original higher-order relations.

2023

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2023

One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.

On the Relationship Between Relevance and Conflict in Online Social Link Recommendations

Yanbang Wang, Jon Kleinberg

Neural Information Processing Systems (NeurIPS) 2023

One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.

A Graph-based Framework for Reducing False Positives in Authentication Alerts in Security Systems

Yanbang Wang, Karl Hallgren, Jonathan Larson

The Web Conference (WebConf) 2023

We address the high false-positive rate of authentication alerts with a framework based on self-supervised link prediction over dynamic authentication networks, validated on four months of data from 125 real organizations. Work done during an internship at Microsoft Research.

A Graph-based Framework for Reducing False Positives in Authentication Alerts in Security Systems

Yanbang Wang, Karl Hallgren, Jonathan Larson

The Web Conference (WebConf) 2023

We address the high false-positive rate of authentication alerts with a framework based on self-supervised link prediction over dynamic authentication networks, validated on four months of data from 125 real organizations. Work done during an internship at Microsoft Research.

2022

Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning

Haoteng Yin, Muhan Zhang, Yanbang Wang, Jianguo Wang, Pan Li

Proceedings of the VLDB Endowment (VLDB) 2022

Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning

Haoteng Yin, Muhan Zhang, Yanbang Wang, Jianguo Wang, Pan Li

Proceedings of the VLDB Endowment (VLDB) 2022

2021

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li

International Conference on Learning Representations (ICLR) 2021

Causal Anonymous Walks (CAWs) automatically retrieve temporal network motifs to represent network dynamics and use an anonymization strategy that keeps the method inductive, achieving SOTA on transductive and inductive temporal link prediction.

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li

International Conference on Learning Representations (ICLR) 2021

Causal Anonymous Walks (CAWs) automatically retrieve temporal network motifs to represent network dynamics and use an anonymization strategy that keeps the method inductive, achieving SOTA on transductive and inductive temporal link prediction.

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec

The Web Conference (WebConf) 2021

TEDIC learns representations on dynamic social interaction networks by diffusing node attributes over a network and its complement and applying temporal convolutions, outperforming prior methods across four social-character prediction tasks.

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec

The Web Conference (WebConf) 2021

TEDIC learns representations on dynamic social interaction networks by diffusing node attributes over a network and its complement and applying temporal convolutions, outperforming prior methods across four social-character prediction tasks.

Revisiting Graph Neural Networks and Distance Encoding in a Practical View

Haoteng Yin, Yanbang Wang, Pan Li

AAAI Deep Learning on Graphs Workshop (AAAI-DLG) 2021

Revisiting Graph Neural Networks and Distance Encoding in a Practical View

Haoteng Yin, Yanbang Wang, Pan Li

AAAI Deep Learning on Graphs Workshop (AAAI-DLG) 2021

2020

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Neural Information Processing Systems (NeurIPS) 2020

Distance Encoding (DE) is a general class of structure-related features that provably gives GNNs more expressive power than the 1-Weisfeiler–Lehman test, distinguishing node sets in almost all regular graphs where traditional GNNs fail.

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning

Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec

Neural Information Processing Systems (NeurIPS) 2020

Distance Encoding (DE) is a general class of structure-related features that provably gives GNNs more expressive power than the 1-Weisfeiler–Lehman test, distinguishing node sets in almost all regular graphs where traditional GNNs fail.

A Network-based Method for Estimating Potential for Career Advancement from Incomplete Data

Yanbang Wang, Bijia Chen, Cameron Campbell

Social Science History Association (SSHA) 2020

A Network-based Method for Estimating Potential for Career Advancement from Incomplete Data

Yanbang Wang, Bijia Chen, Cameron Campbell

Social Science History Association (SSHA) 2020

Generic Representation Learning for Dynamic Social Interaction

Yanbang Wang, Pan Li, Chongyang Bai, VS Subrahmanian, Jure Leskovec

KDD Workshop on Mining and Learning with Graphs (KDD-MLG) 2020

Generic Representation Learning for Dynamic Social Interaction

Yanbang Wang, Pan Li, Chongyang Bai, VS Subrahmanian, Jure Leskovec

KDD Workshop on Mining and Learning with Graphs (KDD-MLG) 2020

EmotionCues: Emotion-Oriented Visual Summarization of Classroom Videos

Haipeng Zeng, Xinhuan Shu, Yanbang Wang, Yong Wang, Liguo Zhang, Ting-Chuen Pong, Huamin Qu

IEEE Transactions on Visualization and Computer Graphics (TVCG) 2020

EmotionCues: Emotion-Oriented Visual Summarization of Classroom Videos

Haipeng Zeng, Xinhuan Shu, Yanbang Wang, Yong Wang, Liguo Zhang, Ting-Chuen Pong, Huamin Qu

IEEE Transactions on Visualization and Computer Graphics (TVCG) 2020

2019

Transfer Learning using Representation Learning in Massive Open Online Courses

Mucong Ding, Yanbang Wang, Erik Hemberg, Una-May O'Reilly

International Conference on Learning Analytics & Knowledge (LAK) 2019

Transfer Learning using Representation Learning in Massive Open Online Courses

Mucong Ding, Yanbang Wang, Erik Hemberg, Una-May O'Reilly

International Conference on Learning Analytics & Knowledge (LAK) 2019

Using Detailed Access Trajectories for Learning Behavior Analysis

Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reilly

International Conference on Learning Analytics & Knowledge (LAK) 2019

We introduce Detailed Access Trajectories (DATs), a mid-resolution representation of MOOC learner activity between raw clickstreams and coarse aggregates, and show through empirical studies that DATs capture rich information about learning behavior.

Using Detailed Access Trajectories for Learning Behavior Analysis

Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reilly

International Conference on Learning Analytics & Knowledge (LAK) 2019

We introduce Detailed Access Trajectories (DATs), a mid-resolution representation of MOOC learner activity between raw clickstreams and coarse aggregates, and show through empirical studies that DATs capture rich information about learning behavior.