Iterative Residual Rescaling: An analysis and generalization of LSI
Rie Kubota Ando and Lillian Lee
Proceedings of SIGIR, pp. 154--162, 2001

We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.

@inproceedings{Ando+Lee:01a, author = {Rie Kubota Ando and Lillian Lee}, title = {{Iterative Residual Rescaling}: An analysis and generalization of {LSI}}, year = {2001}, pages = {154--162}, booktitle = {Proceedings of SIGIR} }

LSI problem: geometric intuition

This paper is based upon work supported in part by the National Science Foundation under ITR/IM grant IIS-0081334. Any opinions, findings, and conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of the National Science Foundation.