Abstract: We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; and the introduction of a novel function that is superior at evaluating potential proxy distributions.
Data: http://www.cs.cornell.edu/home/llee/data/sim.html
BibTeX entry:
@InProceedings{Lee:99a,
author = {Lillian Lee},
title = {Measures of Distributional Similarity},
booktitle = "37th Annual Meeting of the Association for Computational Linguistics",
pages={25--32},
year = 1999,
}