Our goal is to develop machine learning algorithms that improve retrieval quality using implicit feedback. Examples of implicit feedback are the links a user clicks on in the ranked results, the time a user spends reading a page, or how a user reformulates a query. Such observable behavior gives weak and noisy feedback information about which links the user preferred in the returned ranking. The goal of this research is to utilize this information and learn an improved ranking function. The benefit of using machine learning from unobtrusive feedback is that search engines can adapt to the preferences of user groups, particular users, and the dynamic properties of a particular collection without expert parameter tuning.
Implementations of the search engines used in our experiments are available on the Osmot Search Engine page.
Currently, three instances of learning search engines are installed and operational. Both use a special form of Support Vector Machine to learn the ranking function. Feel free to try them out.
Notice: Your actions (e.g. queries, clicks) are recorded for research purposes.
The NSF CAREER Award No. 0237381 "Improving Information
Access by Learning from User Interactions" takes a machine learning approach
to improving the effectiveness of information access tools, in particular the
retrieval quality of search engines. The ability to learn enables a search
engine to automatically adapt its retrieval strategy to individual users, to
specific user groups, and to particular WWW sites. A search engine should learn,
for example, that a query for ``Michael Jordan'' issued from a user at
cs.cornell.edu is much more likely to refer to the professor at UC Berkeley than
for an average user. Similarly, a
search engine should be able to adapt to collection properties, for example,
that in a particular intranet not the TITLE field, but the H1 headlines contain
the most important information.
Since explicit user feedback is rarely available, implicit feedback derived from observable user behavior is used as the input to the learning algorithms. Such implicit feedback requires new machine learning methods, since it comes in forms that are different from the standard machine learning settings. For examples, in search engines it is more reasonable to exploit clickthrough data as feedback in the form of pair-wise preferences (e.g. ``for query q, document da should be ranked higher than document db'') than as an absolute relevance feedback. The project analyzes the reliability of implicit clickthrough data, designs and analyzes learning methods, and evaluates their applicability.
The publication [Joachims/02a] describes how one can derive unbiased user preferences from clickthrough data. The key idea is to design the returned ranking as a blind test for comparing two competing retrieval strategies. The paper introduces a method for statistically interpreting clickthrough data from this setup in a well-founded way. Under mild assumptions, it is shown that such an analysis will come to the same conclusions as a traditional evaluation with manual relevance judgments.
A paper describing the learning algorithm currently implemented in Striver is [Joachims/02c]. It argues that pair-wise preference judgments can be extracted from clickthrough data. The judgments are used as input to a Support Vector Machine method that learns an improved ranking function for retrieval.
F. Radlinski, M. Kurup, T.
Joachims, How Does Clickthrough Data Reflect Retrieval Quality?,
Conference on Information and Knowledge Management (CIKM), 2008.
Yisong Yue and T.
Joachims, Predicting Diverse Subsets Using Structural SVMs,
Proceedings of the International Conference on Machine Learning (ICML),
[PDF] [BibTeX] [Software]
F. Radlinski and R.
Kleinberg and T. Joachims,
Learning Diverse Rankings with Multi-Armed Bandits, Proceedings
of the International Conference on Machine Learning (ICML), 2008.
Search Engines that Learn from Implicit Feedback, IEEE Computer,
Vol. 40, No. 8,
[IEEE Digital Library] [BibTeX] [Software]
F. Radlinski, T. Joachims,
Active Exploration for Learning Rankings from Clickthrough Data,
Proceedings of the ACM Conference on Knowledge Discovery and Data Mining
T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F.
Radlinski, G. Gay, Evaluating the Accuracy of Implicit Feedback from
Clicks and Query Reformulations in Web Search, ACM Transactions on
Information Systems (TOIS), Vol. 25, No. 2 (April), 2007.
S. Pohl, F. Radlinski, T.
Joachims, Recommending Related Papers Based on Digital Library Access
Records, Proceeding of the Joint Conference on Digital Libraries (JCDL),
and T. Joachims, Efficient and Non-Parametric Reasoning over User
Preferences, User Modeling and User-Adapted Interaction (UMUAI),
Vol. 17, No. 1-2, pp. 41-69, Springer, 2007.
[Springer Link] [BibTeX]
F. Radlinski and T. Joachims, Minimally Invasive
Randomization for Collecting Unbiased Preferences from Clickthrough Logs,
Proceedings of the National Conference of the American Association for
Artificial Intelligence (AAAI), 2005.
T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G.
Gay, Accurately Interpreting Clickthrough Data as Implicit Feedback,
Proceedings of the Conference on Research and Development in Information
Retrieval (SIGIR), 2005.
F. Radlinski and T. Joachims, Query Chains: Learning
to Rank from Implicit Feedback, Proceedings of the ACM Conference on
Knowledge Discovery and Data Mining (KDD), ACM, 2005.
[Postscript] [PDF] (KDD Best Student Paper Award)
F. Radlinski and T. Joachims, Evaluating the
Robustness of Learning from Implicit Feedback, ICML Workshop on
Learning In Web Search, 2005.
C. Domshlak and T. Joachims,
Unstructuring User Preferences: Efficient Non-Parametric Utility
Revelation, Proceedings of the Conference on Uncertainty in Artificial
Intelligence (UAI), 2005.
L. Granka, T.
Joachims, and G. Gay, Eye-Tracking Analysis of User Behavior in
WWW-Search, Poster Abstract, Proceedings of the Conference on Research
and Development in Information Retrieval (SIGIR), 2004.
T. Hofmann, T. Joachims, and Y. Altun, Support Vector Machine Learning
for Interdependent and Structured Output Spaces, Proceedings of the
International Conference on Machine Learning (ICML), 2004.
Schultz and T. Joachims, Learning a Distance Metric from Relative
Comparisons, Proceedings of the Conference on Advance in Neural
Information Processing Systems (NIPS), 2003.
T. Joachims, Evaluating Retrieval Performance Using Clickthrough Data,
Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in
Information Retrieval, 2002.
Online [Postscript] [PDF]
This material is based upon work supported by the National Science Foundation under CAREER Award No. 0237381. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).