Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni
Preprint (under review) 2026
Gemini-SQL2 is currently the best coding LLM for text-to-SQL in the world. Gemini-SQL2 is Gemini 3.1 Pro post-trained and serves in a dedicated agentic harness. It currently ranks #1 on the BIRD leaderboard which is the de facto standard for text-to-SQL tasks.
Yanbang Wang, Qitian Wu, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Hailong Li, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi, Vahab Mirrokni
Preprint (under review) 2026
Gemini-SQL2 is currently the best coding LLM for text-to-SQL in the world. Gemini-SQL2 is Gemini 3.1 Pro post-trained and serves in a dedicated agentic harness. It currently ranks #1 on the BIRD leaderboard which is the de facto standard for text-to-SQL tasks.
Yanbang Wang, Jon Kleinberg, Yanhong Wu
International Conference on Machine Learning (ICML) 2026
We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.
Yanbang Wang, Jon Kleinberg, Yanhong Wu
International Conference on Machine Learning (ICML) 2026
We revisit negative sampling for recommender systems from first principles and propose a redesign that improves recommendation quality.
Yanbang Wang, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi
U.S. Patent 2026
Yanbang Wang, Sami Abu-el-Haija, Mohammadreza Pourreza, Michael Galkin, Hadi Hemmati, Yeounoh Chung, Fatma Ozcan, Bryan Perozzi
U.S. Patent 2026
Yanbang Wang, Hejie Cui, Jon Kleinberg
Neural Information Processing Systems (NeurIPS) 2025
The first systematic study of how LLMs memorizes structural information in text. We find LLMs often underperform and are biased towards certain error patterns, and that stronger models memorizes better when the structures are narrated in a domain-consistent style.
Yanbang Wang, Hejie Cui, Jon Kleinberg
Neural Information Processing Systems (NeurIPS) 2025
The first systematic study of how LLMs memorizes structural information in text. We find LLMs often underperform and are biased towards certain error patterns, and that stronger models memorizes better when the structures are narrated in a domain-consistent style.
Yanbang Wang, Karl Hallgren, Jonathan Larson
U.S. Patent (US20250337760A1) 2025
Yanbang Wang, Karl Hallgren, Jonathan Larson
U.S. Patent (US20250337760A1) 2025
Yanbang Wang, Hejie Cui, Jon Kleinberg
International Conference on Computational Social Science (IC2S2), Oral 2024
Yanbang Wang, Hejie Cui, Jon Kleinberg
International Conference on Computational Social Science (IC2S2), Oral 2024
Yanbang Wang, Jon Kleinberg
International Conference on Computational Social Science (IC2S2) 2024
Yanbang Wang, Jon Kleinberg
International Conference on Computational Social Science (IC2S2) 2024
Yanbang Wang, Jon Kleinberg
International Conference on Learning Representations (ICLR) 2024
We study the consequences of representing higher-order systems as graphs rather than hypergraphs, characterizing the information lost in hypergraph projection and proposing a learning-based method to reconstruct the original higher-order relations.
Yanbang Wang, Jon Kleinberg
International Conference on Learning Representations (ICLR) 2024
We study the consequences of representing higher-order systems as graphs rather than hypergraphs, characterizing the information lost in hypergraph projection and proposing a learning-based method to reconstruct the original higher-order relations.
Yanbang Wang, Jon Kleinberg
Neural Information Processing Systems (NeurIPS) 2023
One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.
Yanbang Wang, Jon Kleinberg
Neural Information Processing Systems (NeurIPS) 2023
One of the first rigorous analyses of how link recommendations that boost engagement can also escalate conflict and polarization, using the Friedkin–Johnsen model of opinion dynamics.
Yanbang Wang, Karl Hallgren, Jonathan Larson
The Web Conference (WebConf) 2023
We address the high false-positive rate of authentication alerts with a framework based on self-supervised link prediction over dynamic authentication networks, validated on four months of data from 125 real organizations. Work done during an internship at Microsoft Research.
Yanbang Wang, Karl Hallgren, Jonathan Larson
The Web Conference (WebConf) 2023
We address the high false-positive rate of authentication alerts with a framework based on self-supervised link prediction over dynamic authentication networks, validated on four months of data from 125 real organizations. Work done during an internship at Microsoft Research.
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li
International Conference on Learning Representations (ICLR) 2021
Causal Anonymous Walks (CAWs) automatically retrieve temporal network motifs to represent network dynamics and use an anonymization strategy that keeps the method inductive, achieving SOTA on transductive and inductive temporal link prediction.
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li
International Conference on Learning Representations (ICLR) 2021
Causal Anonymous Walks (CAWs) automatically retrieve temporal network motifs to represent network dynamics and use an anonymization strategy that keeps the method inductive, achieving SOTA on transductive and inductive temporal link prediction.
Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec
The Web Conference (WebConf) 2021
TEDIC learns representations on dynamic social interaction networks by diffusing node attributes over a network and its complement and applying temporal convolutions, outperforming prior methods across four social-character prediction tasks.
Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec
The Web Conference (WebConf) 2021
TEDIC learns representations on dynamic social interaction networks by diffusing node attributes over a network and its complement and applying temporal convolutions, outperforming prior methods across four social-character prediction tasks.
Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec
Neural Information Processing Systems (NeurIPS) 2020
Distance Encoding (DE) is a general class of structure-related features that provably gives GNNs more expressive power than the 1-Weisfeiler–Lehman test, distinguishing node sets in almost all regular graphs where traditional GNNs fail.
Pan Li, Yanbang Wang, Hongwei Wang, Jure Leskovec
Neural Information Processing Systems (NeurIPS) 2020
Distance Encoding (DE) is a general class of structure-related features that provably gives GNNs more expressive power than the 1-Weisfeiler–Lehman test, distinguishing node sets in almost all regular graphs where traditional GNNs fail.
Yanbang Wang, Bijia Chen, Cameron Campbell
Social Science History Association (SSHA) 2020
Yanbang Wang, Bijia Chen, Cameron Campbell
Social Science History Association (SSHA) 2020
Yanbang Wang, Pan Li, Chongyang Bai, VS Subrahmanian, Jure Leskovec
KDD Workshop on Mining and Learning with Graphs (KDD-MLG) 2020
Yanbang Wang, Pan Li, Chongyang Bai, VS Subrahmanian, Jure Leskovec
KDD Workshop on Mining and Learning with Graphs (KDD-MLG) 2020
Haipeng Zeng, Xinhuan Shu, Yanbang Wang, Yong Wang, Liguo Zhang, Ting-Chuen Pong, Huamin Qu
IEEE Transactions on Visualization and Computer Graphics (TVCG) 2020
Haipeng Zeng, Xinhuan Shu, Yanbang Wang, Yong Wang, Liguo Zhang, Ting-Chuen Pong, Huamin Qu
IEEE Transactions on Visualization and Computer Graphics (TVCG) 2020
Mucong Ding, Yanbang Wang, Erik Hemberg, Una-May O'Reilly
International Conference on Learning Analytics & Knowledge (LAK) 2019
Mucong Ding, Yanbang Wang, Erik Hemberg, Una-May O'Reilly
International Conference on Learning Analytics & Knowledge (LAK) 2019
Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reilly
International Conference on Learning Analytics & Knowledge (LAK) 2019
We introduce Detailed Access Trajectories (DATs), a mid-resolution representation of MOOC learner activity between raw clickstreams and coarse aggregates, and show through empirical studies that DATs capture rich information about learning behavior.
Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reilly
International Conference on Learning Analytics & Knowledge (LAK) 2019
We introduce Detailed Access Trajectories (DATs), a mid-resolution representation of MOOC learner activity between raw clickstreams and coarse aggregates, and show through empirical studies that DATs capture rich information about learning behavior.