Department of Computer Science
Striver is a testbed for research on machine learning algorithms that improve retrieval quality using implicit feedback. In particular, Striver observes what links a user clicks on in the ranking and records this for each query. This gives weak and noisy feedback information about which links the user preferred in the returned ranking. The goal of this research is to utilize this information and learn an improved ranking function. The benefit of using machine learning from unobtrusive feedback is that search engines can adapt to the preferences of user groups, particular users, and the dynamic properties of a particular collection without expert parameter tuning.
Currently, an instance of Striver is installed that lets you search the Cornell Web. It uses a special form of Support Vector Machine to learn the ranking function. This instance is build on top of the CU Search Engine, re-ranking its results.
Feel free to try it out. Type in keywords.
Notice: Your actions are anonymized and recorded for research purposes.
The NSF CAREER Award No. 0237381 “Improving Information
Access by Learning from User Interactions” takes a machine learning approach
to improving the effectiveness of information access tools, in particular the
retrieval quality of search engines. The ability to learn enables a search
engine to automatically adapt its retrieval strategy to individual users, to
specific user groups, and to particular WWW sites. A search engine should learn,
for example, that a query for ``Michael Jordan'' issued from a user at
cs.cornell.edu is much more likely to refer to the professor at UC Berkeley than
for an average user. Similarly, a
search engine should be able to adapt to collection properties, for example,
that in a particular intranet not the TITLE field, but the H1 headlines contain
the most important information.
Since explicit user feedback is rarely available, implicit feedback derived from observable user behavior is used as the input to the learning algorithms. Such implicit feedback requires new machine learning methods, since it comes in forms that are different from the standard machine learning settings. For examples, in search engines it is more reasonable to exploit clickthrough data as feedback in the form of pair-wise preferences (e.g. ``for query q, document da should be ranked higher than document db'') than as an absolute relevance feedback. The project analyzes the reliability of implicit clickthrough data, designs and analyzes learning methods, and evaluates their applicability.
The publication [Joachims/02a] describes how one can derive unbiased user preferences from clickthrough data. The key idea is to design the returned ranking as a blind test for comparing two competing retrieval strategies. The paper introduces a method for statistically interpreting clickthrough data from this setup in a well-founded way. Under mild assumptions, it is shown that such an analysis will come to the same conclusions as a traditional evaluation with manual relevance judgments.
A paper describing the learning algorithm currently implemented in Striver is [Joachims/02c]. It argues that pair-wise preference judgments can be extracted from clickthrough data. The judgments are used as input to a Support Vector Machine method that learns an improved ranking function for retrieval.
L. Granka, T.
Joachims, and G. Gay, Eye-Tracking Analysis of User Behavior in
WWW-Search, Poster Abstract, Proceedings of the Conference on Research
and Development in Information Retrieval (SIGIR), 2004.
T. Hofmann, T. Joachims, and Y. Altun, Support Vector Machine Learning
for Interdependent and Structured Output Spaces, Proceedings of the
International Conference on Machine Learning (ICML), 2004.
Schultz and T. Joachims, Learning a Distance Metric from Relative
Comparisons, Proceedings of the Conference on Advance in Neural
Information Processing Systems (NIPS), 2003.
T. Joachims, Evaluating Retrieval Performance Using Clickthrough Data,
Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in
Information Retrieval, 2002.
Online [Postscript] [PDF]
This material is based upon work supported by the National Science Foundation under CAREER Award No. 0237381. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).