Estimating word co-occurrence probabilities is a problem underlying many applications in statistical natural language processing. Distance-weighted (or similarity-weighted) averaging has been shown to be a promising approach to the analysis of novel co-occurrences. Many measures of distributional similarity have been proposed for use in the distance-weighted averaging framework; here, we empirically study their stability properties, finding that similarity-based estimation appears to make more efficient use of more reliable portions of the training data. We also investigate properties of the skew divergence, a weighted version of the Kullback-Leibler (KL) divergence; our results indicate that the skew divergence yields better results than the KL divergence even when the KL divergence is applied to more sophisticated probability estimates.
@inproceedings{Lee:01a, author = {Lillian Lee}, title = {On the effectiveness of the skew divergence for statistical language analysis}, year = {2001}, pages = {65--72}, booktitle = {Proceedings of Artificial Intelligence and Statistics (AISTATS)} }
This material is based upon work supported in part by the National Science Foundation under Grant No. IRI9712068. Any opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views or official policies, either expressed or implied, of any sponsoring institutions, the U.S. government, or any other entity.