Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

research-article

Are click-through data adequate for learning web search rankings?

Authors:

Xiaojie Yuan, and

Ji-Rong WenAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

Pages 73 - 82

https://doi.org/10.1145/1458082.1458095

Published: 26 October 2008 Publication History

Abstract

Learning-to-rank algorithms, which can automatically adapt ranking functions in web search, require a large volume of training data. A traditional way of generating training examples is to employ human experts to judge the relevance of documents. Unfortunately, it is difficult, time-consuming and costly. In this paper, we study the problem of exploiting click-through data for learning web search rankings that can be collected at much lower cost. We extract pairwise relevance preferences from a large-scale aggregated click-through dataset, compare these preferences with explicit human judgments, and use them as training examples to learn ranking functions. We find click-through data are useful and effective in learning ranking functions. A straightforward use of aggregated click-through data can outperform human judgments. We demonstrate that the strategies are only slightly affected by fraudulent clicks. We also reveal that the pairs which are very reliable, e.g., the pairs consisting of documents with large click frequency differences, are not sufficient for learning.

References

[1]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2006. ACM Press.

Digital Library

[2]

E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, New York, NY, USA, 2006. ACM Press.

Digital Library

[3]

A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.

Digital Library

[4]

{4} C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM Press.

Digital Library

[5]

C.J.C. Burges, R. Ragno, and Q.V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 395--402, Cambridge, MA, 2006. MIT Press.

Digital Library

[6]

Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 186--193, New York, NY, USA, 2006. ACM Press.

Digital Library

[7]

Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 581--590, New York, NY, USA, 2007. ACM Press.

Digital Library

[8]

S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst., 23(2):147--168, 2005.

Digital Library

[9]

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933--969, 2003.

Digital Library

[10]

{10} K. Järvelin and J. Kekäläinen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 41--48, New York, NY, USA, 2000. ACM Press.

Digital Library

[11]

T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, New York, NY, USA, 2002. ACM Press.

Digital Library

[12]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154--161, New York, NY, USA, 2005. ACM Press.

Digital Library

[13]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst., 25(2):7, 2007.

Digital Library

[14]

D. Kelly and J. Teevan. Implicit feedback for inferring user preference: a bibliography. SIGIR Forum, 37(2):18--28, 2003.

Digital Library

[15]

M. Kendall and B.B. Smith. Randomness and random sampling numbers. Journal of the Royal Statistical Society, 101(1):147--166, 1938.

[16]

T.-Y. Liu, T. Qin, J. Xu, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR 2007 in conjunction with SIGIR 2007, 2007.

Digital Library

[17]

{17} F. Radlinski and T. Joachims. Evaluating the robustness of learning from implicit feedback. In Proceedings of the 22nd ICML Workshop on Learning in Web Search, 2005.

[18]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM Press.

Digital Library

[19]

F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.

Digital Library

[20]

Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 287--294, New York, NY, USA, 2007. ACM Press.

Digital Library

Cited By

Seifikar MPhan Minh LArabzadeh NClarke CSmucker MChen HDuh WHuang HKato MMothe JPoblete B(2023)A Preference Judgment Tool for Authoritative AssessmentProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591801(3100-3104)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591801
Chen MGao FWu J(2022)Web Spam Detection based on Single Page Semantic Features2022 4th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST57965.2022.10061916(1083-1087)Online publication date: 9-Dec-2022
https://doi.org/10.1109/IAECST57965.2022.10061916
Akram-Ali-Hammouri ZFernández-Delgado MAlbtoush ACernadas EBarro S(2022)Ideal kernel tuningNeurocomputing10.1016/j.neucom.2022.03.034489:C(1-8)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.1016/j.neucom.2022.03.034
Show More Cited By

Index Terms

Are click-through data adequate for learning web search rankings?
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Information retrieval query processing

Recommendations

Optimizing web search using web click-through data
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the ...
Read More
Click data as implicit relevance feedback in web search

Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps ...
Read More
Incremental learning to rank with partially-labeled data
WSCD '09: Proceedings of the 2009 workshop on Web Search Click Data

In this paper we present a semi-supervised learning method for a problem of learning to rank where we exploit Markov random walks and graph regularization in order to incorporate not only "labeled" web pages but also plenty of "un-labeled" web pages (...
Read More

Comments

Information & Contributors

Information

Published In

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
761
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Seifikar MPhan Minh LArabzadeh NClarke CSmucker MChen HDuh WHuang HKato MMothe JPoblete B(2023)A Preference Judgment Tool for Authoritative AssessmentProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591801(3100-3104)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591801
Chen MGao FWu J(2022)Web Spam Detection based on Single Page Semantic Features2022 4th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST57965.2022.10061916(1083-1087)Online publication date: 9-Dec-2022
https://doi.org/10.1109/IAECST57965.2022.10061916
Akram-Ali-Hammouri ZFernández-Delgado MAlbtoush ACernadas EBarro S(2022)Ideal kernel tuningNeurocomputing10.1016/j.neucom.2022.03.034489:C(1-8)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.1016/j.neucom.2022.03.034
Zhao JHuang JDeng HChang YXia L(2021)Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search PersonalizationACM Transactions on Information Systems10.1145/347610640:3(1-24)Online publication date: 30-Dec-2021
https://dl.acm.org/doi/10.1145/3476106
Yigit‐Sert SAltingovde IMacdonald COunis IUlusoy Ö(2021)Explicit diversification of search results across multiple dimensions for educational searchJournal of the Association for Information Science and Technology10.1002/asi.2440372:3(315-330)Online publication date: 15-Feb-2021
https://dl.acm.org/doi/10.1002/asi.24403
Sakai TZeng ZHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Good Evaluation Measures based on Document PreferencesProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401115(359-368)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401115
Zhou XZafarani R(2020)A Survey of Fake NewsACM Computing Surveys10.1145/339504653:5(1-40)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3395046
Ahuja ARao NKatariya SSubbian KReddy CCaverlee JHu XLalmas MWang W(2020)Language-Agnostic Representation Learning for Product Search on E-Commerce PlatformsProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371852(7-15)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371852
Oliva JRosa J(2019)Classification for EEG report generation and epilepsy detectionNeurocomputing10.1016/j.neucom.2019.01.053335:C(81-95)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1016/j.neucom.2019.01.053
Ksieniewicz PWoźniak MCyganek BKasprzak AWalkowiak K(2019)Data stream classification using active learned neural networksNeurocomputing10.1016/j.neucom.2018.05.130353:C(74-82)Online publication date: 11-Aug-2019
https://dl.acm.org/doi/10.1016/j.neucom.2018.05.130
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents