Corpus structure, language models, and ad hoc information retrieval
Oren Kurland and Lillian Lee.
Proceedings of SIGIR, pp. 194--201, 2004.

Abstract: Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested.

Paper formats: ps, pdf, other

BibTeX entry:

@InProceedings{Kurland+Lee:04a,
  author =       {Oren Kurland and Lillian Lee},
  title =        {Corpus structure, language models, and ad hoc  information retrieval},
  booktitle =    "Proceedings of SIGIR",
  pages={194--201},
  year =         2004


Back links: Lillian Lee's home page or papers page; Cornell NLP page