Cross Lingual Understanding (XLU) through Pretraining: Scaling NLP Across Languages

Abstract: Billions of people around the world use Facebook in over one hundred languages. This linguistic diversity is wonderful, but it presents challenges for Natural Language Processing (NLP) systems. It is simply impossible to annotate data and train a new system for each language. Instead, we rely on Cross-Lingual Understanding (XLU) to learn NLP systems in one language and apply them in languages that are not a part of the original training data.

We have made significant progress in XLU in the last two years. In particular, petraining methods have been an effective way of improving XLU performance. I will give a brief overview of recent pretraining methods such as BERT, XLNet and RoBERTa. I will then cover common XLU benchmark including the Cross-Lingual Natural Language Inference (XNLI) benchmark that we introduced in 2018. I will continue by talking about common methods for XLU and recent progress we have made through pretraining. I will finish by discussing exciting ongoing work in my group.

Bio: Ves is a Research Scientist Manager at Facebook AI focusing on Natural Language Processing (NLP). Before Facebook AI, Ves was a Research Scientist on the Search team at Facebook focusing on NLP uses for Search. Before Facebook, Ves spent three wonderful years as a PostDoc at the Center for Language and Speech Processing at Johns Hopkins University. There he worked with Jason Eisner on Machine Learning for Structured Prediction and was supported by a Computing Innovation Fellowship from the CRA. Ves graduated with a PhD from Cornell University, where he worked with his advisor, Claire Cardie, on opinion analysis. His thesis title is “Opinion Summarization: Automatically Creating Useful Representations Of The Opinions Expressed In Text.” During his PhD, Ves was supported by an NSF Graduate Research Fellowship.