IDF revisited: A simple new derivation within the Robertson-Sparck Jones probabilistic model
Lillian Lee
Proceedings of SIGIR, pp. 751--752, 2007. Poster paper

There have been a number of prior attempts to theoretically justify the effectiveness of the inverse document frequency (IDF). Those that take as their starting point Robertson and Spärck Jones's probabilistic model are based on strong or complex assumptions. We show that a more intuitively plausible assumption suffices. Moreover, the new assumption, while conceptually very simple, provides a solution to an estimation problem that had been deemed intractable by Robertson and Walker (1997).

