Planting Undetectable Backdoors in ML Models | Department of Computer Science

Title: Planting Undetectable Backdoors in ML Models

Abstract: Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning provides clear benefits, but also raises serious concerns of trust. In this talk, we present a possible abuse of power by untrusted service providers. We show how a malicious learner can plant an "undetectable backdoor" into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a cryptographic mechanism to change the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key" the mechanism is completely hidden, i.e., cannot be detected by any computationally-bounded observer. We present precise definitions of undetectability and demonstrate, under standard cryptographic assumptions, that planting undetectable backdoors in machine learning models is possible. Our constructions are quite generic and, thus, present a significant risk for the delegation of learning tasks.
Joint work with Shafi Goldwasser, Vinod Vaikuntanathan, and Or Zamir.

Bio: My research investigates foundational questions about responsible machine learning. Much of this work aims to identify problematic behaviors that emerge in machine-learned models and to develop algorithmic tools that provably mitigate such behaviors. More broadly, I am interested in how the theory of computation can provide insight into emerging societal and scientific challenges. Prior to Cornell, I was a Miller Postdoctoral Fellow at UC Berkeley, hosted by Shafi Goldwasser.I completed my Ph.D. in the Stanford Theory Group under the guidance of Omer Reingold.