info I am on the job market this year, reach me out: eugene@cs.cornell.edu
Upcoming and past seminar talks: Michigan CSE (Apr'23), Columbia CS (Apr'23), Boston University CDS (Apr'23), UW Allen School CSE (Mar'23), McGill CS (Mar'23), CISPA (Feb'23), UMass Manning CICS (Feb'23), UCLA Samueli CS (Jan'23).
I am a CS PhD candidate at Cornell Tech and an Apple AI/ML PhD Scholar advised by Vitaly Shmatikov and Deborah Estrin. I study security and privacy in emerging AI-based systems under real-life conditions and attacks.
My research goal is to build ethical, safe, and private machine learning systems – while keeping these systems practical and useful. Recently, we demonstrated security drawbacks of Federated Learning (AISTATS'20) and fairness implications of Differentially Private Deep Learning (NeurIPS'19). We also proposed a framework for backdoor attacks and defenses (USENIX'21) and a new attack on generative language models (S&P'22) that modifies LLMs and spins the output for Propaganda-as-a-Service.
A big focus of my work is data privacy – I study methods that enable new applications while protecting users. We proposed Ancile – a framework for language-level control over data usage. At Google, I worked on a new algorithm for building private heatmaps (PETS'22). At Apple, I developed a novel way to obtain good tokenizers for Private Federated Learning (FL4NLP@ACL'22). Before starting my PhD, I received an engineer specialist degree from Baumanka and worked at Cisco on OpenStack networking as a QA Engineer.
I grew up in Tashkent, Uzbekistan. In my free time I play water polo and spend time with family.
Tokenization is an important part of training a good language model, however in private federated learning where user data are not available generic tokenization methods reduce performance. We show how to obtain a good tokenizer without spending additional privacy budget.
Work done at Apple. Best paper runner-up award. [PDF].We introduce a constrain-and-scale attack, a form of data poisoning, that can stealthily inject a backdoor into one of the participating models during a single round of Federated Learning training. This attack can avoid proposed defenses and propagate the backdoor to a global server that will distribute the compromised model to other participants.
[PDF], [Code].This project discusses a new trade off between privacy and fairness. We observe that training a Machine Learning model with Differential Privacy reduces accuracy on underrepresented groups.
[NeurIPS, 2019], [Code].A fast and compact cloud-native implementation of containers.
[PDF].