Research Interests

Now, a few words on looking for things.
When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you're only looking for one of them.
When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you're sure to find some of them.
—The Zero Effect, written and directed by Jake Kasdan (who didn't have to worry about p-hacking)

"From now on, you do investigative work."
I responded with my usual savoir faire: "But I don't know anything about investigative reporting"
Alan looked at me for what I remember as a very long time. "Just remember," he said. "Turn every page. Never assume anything. Turn every goddam page."
—Robert Caro, "Turn Every Page", The New Yorker January 28, 2019

These days, I am most interested in connections between natural language processing and social interaction. More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format; there are thus tremendous opportunities for natural-language processing to contribute to the analysis and facilitation of socially embedded processes. Here is a link to a course I co-developed, “Natural Language Processing and Social Interaction”, which covers some of the topics in this arena that I'm intrigued by.

The goal of natural language processing and information retrieval (I do not subscribe to the view that these are different fields) is, in a broad sense, to enable computers to use human language as a (one-way or two-way) communication medium accurately, robustly, and gracefully [a short, breezy history]. With respect to “pure” language technologies, my research interests are in the empirical and theoretical problems that arise in the pursuit of this goal.

A sampling of areas I've worked on:

interactions with the social sciences: uncovering what language features make an argument convincing, predicting votes from Congressional speeches, non-textual influences on Amazon helpfulness votes, how language echoing reveals power relationships, whether phrasing affects memorability, hedging and framing in discussions of genetically-modified organisms, etc.
Multimodality and LLMs: The New Yorker Caption Contest as a humor "understanding" benchmark, evaluating whether a multimodal model really learns cross-modal interactions
Causality in content moderation
sentiment analyis and opinion mining
semantics: paraphrasing work as written up by the New York Times, distributional similarity, lexical entailment, etc.
information retrieval
grammar formalisms: complexity of parsing (JACM), learning of CFLs

See my papers page for a full list and more information.

Doing this work really just involved following the lead of my students and postdocs, who have been spectacular. My debt to them is unbounded.

Papers on these topics.
Lillian Lee's home page
Cornell Natural Language Processing Group