My primary research focus is on statistical methods for natural language processing, especially in settings involving unsupervised learning.
A major goal is to develop general methods for learning how to predict the probability of linguistic events, a task with applications to a host of language processing problems, such as speech recognition and machine translation. Recently, I have been investigating clustering and nearest-neighbor techniques for dealing with situations in which little data is available.
Also, joint work with Rie Kubota Ando involves developing empirical methods for segmenting Japanese (which lacks space delimiters between words) without relying on dictionaries or pre-segmented training data. Interestingly, our results show that one can learn to make segmentation decisions by relying on large amounts of raw (unsegmented) data, which is relatively easy to acquire.
Chair: Cornell University Computer Science colloquium series.
Member: Field of Cognitive Studies.
Editorial Board Member: Computational Linguistics (2000-2002).
Program Committee Member: 17th National Conference on Artificial Intelligence (AAAI); 17th International Conference on Machine Learning (ICML); 1st Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
Review Panel Member: 38th Annual Meeting of the Association for Computational Linguistics (ACL); Neural Information Processing Systems (NIPS 2000); Student Abstract and Poster Program, 17th National Conference on Artificial Intelligence (AAAI).
Journal Referee: Computational Linguistics; ACM Transactions on Information Systems.
Two unsupervised methods in natural language processing. Harvard University, August 1999.
Similarity-based methods in natural language processing. IBM Watson Research Center, October 1999.
—. Microsoft Research, November 1999.
“Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji.” First Conference of the NAACL (2000), 241–248 (with R.K. Ando).
“Foundations of Statistical Natural Language Processing.” Invited book review. Computational Linguistics 26(2) (2000), 277-279.