Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules

Vincent Ng and Claire Cardie.
2002 Coreference on Empirical Methods in Natural Language Processing (EMNLP), 2002.

Click here for the PostScript or PDF version.

Abstract

Most machine learning solutions to noun phrase coreference resolution recast the problem as a classification task. We examine three potential problems with this reformulation, namely, skewed class distributions, the inclusion of hard training instances, and the loss of transitivity inherent in the original coreference relation. We show how these problems can be handled via intelligent sample selection and error-driven pruning of classification rulesets. The resulting system achieves an F-measure of 69.5 and 63.4 on the MUC-6 and MUC-7 coreference resolution data sets, respectively, surpassing the performance of the best MUC-6 and MUC-7 coreference systems. In particular, the system outperforms the best-performing learning-based coreference system to date.