Esin Durmus

esin

Contact: esindurmus AT cs DOT stanford DOT edu

[Google Scholar] [Semantic Scholar] [CV]

Hi! I am Esin Durmus. I am a Research Scientist at Anthropic . Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.

I work on evaluating the safety and societal impact of large language models. In particular, I am interested in understanding how these models may impact our society and how can we build models that are safe and helpful. I am currently working on the following reserch directions:

Publications

  1. Towards Measuring the Representation of Subjective Global Opinions in Language Models
    Esin Durmus, Karina Nyugen, Thomas I Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
    Preprint, 2023.
    [paper]

  2. Opportunities and Risks of LLMs for Scalable Deliberation with Polis
    Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry, Julien Cornebise, Ted Suzman, Deep Ganguli, Colin Megill
    Preprint, 2023.
    [paper]

  3. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
    Myra Cheng, Esin Durmus, Dan Jurafsky
    In Proceedings of ACL, 2023.
    Social Impact Award
    [paper]

  4. Tracing and Removing Data Errors in Natural Language Generation Datasets
    Faisal Ladhak, Esin Durmus, Tatsunori Hashimoto
    In Proceedings of ACL, 2023.
    [paper]

  5. Whose opinions do language models reflect?
    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
    In Proceedings of ICML, 2023.
    [paper]

  6. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
    In Proceedings of FAccT, 2023.
    [paper]

  7. Benchmarking large language models for news summarization
    Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto
    Preprint, 2023.
    [paper]

  8. When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
    Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen Mckeown, Tatsunori B Hashimoto
    In Proceedings of EACL, 2023.
    [paper]

  9. Evaluating Human-Language Model Interaction
    Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
    Preprint, 2022.
    [paper]

  10. Holistic Evaluation of Language Models
    Preprint, 2022.
    [paper]

  11. Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
    Tianshu Wang, Faisal Ladhak, Esin Durmus, He He
    In Proceedings of EMNLP, 2022.

  12. Spurious Correlations in Reference-Free Evaluation of Text Generation
    Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto
    In Proceedings of ACL, 2022.
    [paper]

  13. Gemv2: Multilingual nlg benchmarking in a single line of code
    2022.
    [paper]

  14. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
    Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown
    In Proceedings of ACL, 2022.
    [paper]

  15. Language Modeling via Stochastic Processes
    Rose E Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto
    In Proceedings of ICLR, 2022.
    [paper]

  16. On the Opportunities and Risks of Foundation Models
    [paper] [bib]

  17. Towards Understanding Persuasion in Computational Argumentation
    PhD Dissertation
    [paper] [bib]

  18. Leveraging Topic Relatedness for Argument Persuasion
    Xinran Zhao, Esin Durmus, Hongming Zhang, Claire Cardie
    In Findings of ACL, 2021.
    [paper] [bib]

  19. The Gem Benchmark: Natural Language Generation, its Evaluation and Metrics
    [Team] [paper] [bib] [website]

  20. WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
    Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown.
    In Findings of EMNLP, 2020.
    [paper] [data] [bib]

  21. Exploring the Role of Argument Structure in Online Debate Persuasion
    Jialu Li, Esin Durmus and Claire Cardie.
    In Proceedings of EMNLP, 2020.
    [paper] [bib]

  22. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
    Esin Durmus, He He and Mona Diab.
    In Proceedings of ACL, 2020.
    [paper] [code] [bib]

  23. The Role of Pragmatic and Discourse Context in Determining Argument Impact
    Esin Durmus, Faisal Ladhak and Claire Cardie.
    In Proceedings of EMNLP, 2019.
    [paper] [bib]

  24. Determining Relative Argument Specificity and Stance for Complex Argumentative Structures
    Esin Durmus, Faisal Ladhak and Claire Cardie.
    In Proceedings of ACL, 2019.
    [paper] [bib]

  25. A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
    Esin Durmus and Claire Cardie.
    In Proceedings of ACL, 2019.
    [paper] [bib] [dataset]

  26. Persuasion of the Undecided: Language vs. the Listener
    Liane Longpre, Esin Durmus and Claire Cardie.
    In Proceedings of the 6th Workshop in Argumentation Mining 2019.
    [paper] [bib] [dataset]

  27. Modeling the Factors of User Success in Online Debate
    Esin Durmus and Claire Cardie.
    In Proceedings of the World Wide Web Conference (WWW), 2019.
    [paper] [bib] [dataset]
    Cornell Chronicle Story

  28. Exploring the Role of Prior Beliefs for Argument Persuasion
    Esin Durmus and Claire Cardie.
    In Proceedings of NAACL, 2018.
    [paper] [bib] [dataset]

  29. Understanding the Effect of Gender and Stance on Opinion Expression in Debates on "Abortion”.
    Esin Durmus and Claire Cardie.
    In Proceedings of PEOPLES2018 workshop (co-organized with NAACL) on computational modeling of peoples opinions, personality, and emotions in social media.
    [paper] [bib]

  30. Cornell Belief and Sentiment System at TAC 2016
    Vlad Niculae, Kai Sun, Xilun Chen, Yao Cheng, Xinya Du, Esin Durmus, Arzoo Katiyar and Claire Cardie.
    Text Analysis Conference (TAC), 2016.
    [paper] [bib]

Published Datasets

Teaching

Industry Experience