• Machine Learning and Data Mining • Previous Articles     Next Articles

Query Performance Prediction for Information Retrieval Based on Covering Topic Score

Hao Lang1, Bin Wang1, Gareth Jones2, Jin-Tao Li1, Fan Ding1, and Yi-Xuan Liu1   

  1. 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2School of Computing, Dublin City University, Ireland
  • Received:2007-06-18 Revised:2008-03-20 Online:2008-07-10 Published:2008-07-10

We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user's query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag-of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications.

Key words: AM (Agile Manufacturing); AVE(Agile Virtual Enterprise); dynamic organization; methodology;


[1] Carmel D, Yom-Tov E, Soboroff I. Predicting query difficulty. In {\it Proc. SIGIR Workshop}, Salvador, Brazil, 2005, http://www.haifa.ibm.com/sigir05-qp/index.html.
[2]} Voorhees E M. Overview of the TREC 2004 robust track. In \it the Online Proceeding of 2004 Text Retrieval Conference $($TREC 2004$)$. \rm
[3]}Yom-Tov E, Fine S, Carmel D, Darlow A. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In \it Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Salvador, Brazil, 2005, pp.512--519.
[4]} Cronen-Townsend S, Zhou Y, Croft B. Precision prediction based on ranked list coherence. \it Information Retrieval, \rm 2006, 9(6): 723--755.
[5]}Harman D, Buckley C. The NRRC reliable information access (RIA) workshop. In \it Proc. the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Sheffield, United Kingdom, 2004, pp.528--529.
[6]} He B, Ounis I. Inferring query performance using pre-retrieval predictors. In \it Proc. the SPIRE 2004, \rm Padova, Italy, 2004, pp.43--54.
[7]} Plachouras V, He B, Ounis I. University of Glasgow at TREC2004: Experiments in web, robust, and terabyte tracks with terrier. In \it the Online Proc. 2004 Text Retrieval Conference $($TREC 2004$)$. \rm
[8]} Mothe J, Tanguy L. Linguistic features to predict query difficulty. In \it Proc. ACM SIGIR 2005 Workshop on Predicting Query Difficulty-Methods and Applications, \rm 2005.
[9]} Swen B, Lu X-Q, Zan H-Y, Su Q, Lai Z-G, Xiang K, Hu J-H. Part-of-speech sense matrix model experiments in the TREC 2004 robust track at ICL, PKU. In \it the Online Proceeding of 2004 Text Retrieval Conference $($TREC 2004$)$. \rm
[10]} Cronen-Townsend S, Zhou Y, Croft W B. Predicting query performance. In \it Proc. the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Tampere, Finland, 2002, pp.299--306.
[11]} Amati G, Carpineto C, Romano G. Query difficulty, robustness and selective application of query expansion. In \it Proc. the 25th European Conference on Information Retrieval, \rm Sunderland, Great Britain, 2004, pp.127--137.
[12]} Zhou Y, Croft W B. Ranking robustness: A novel framework to predict query performance. In \it Proc. the 15th ACM International Conference on Information and Knowledge Management. \rm Virginia, USA, 2006, pp.567--574.
[13]} Vinay V, Cox I J, Milic-Frayling N, Wood K. On ranking the effectiveness of searches. In \it Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Seattle, USA, 2006, pp.398--404.
[14]} C J van Rijsbergen. Information Retrieval. Second Edition, London: Butterworths, 1979.
[15]} Carmel D, Yom-Tov E, Darlow A, Pelleg D. What makes a query difficult? In \it Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Seattle, USA, 2006, pp.390--397.
[16]} Song F, Croft W B. A general language model for information retrieval. In \it Proc. the 18th ACM International Conference on Information and Knowledge Management, \rm Kansas City, USA, 1999, pp.316--321.
[17]} D Metzler, W Bruce Croft. A Markov random field model for term dependencies. In \it Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Salvador, Brazil, 2005, pp.472--479.
[18]} G Mishne, M de Rijke. Boosting web retrieval through query operations. In \it Proc. the 27th European Conference on Information Retrieval, \rm pp.502--516.
[19]} Yang Y, Liu X. A re-examination of text categorization methods. In \it Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Berkeley, California, USA, 1999, pp.42--49.
[20]} Wasserman L. All of Statistics: A Concise Course in Statistical Inference. Springer Press, 2004.
[21]} Tao T, Zhai C. Regularized estimation of mixture models for robust pseudo-relevance feedback. In \it Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, \rm Seattle, USA, 2006, pp.162--169.
[1] Bao-Xia Fan, Liang Yang, Jiang-Mei Wang, Ru Wang, Bin Xiao, Ying Xu, Dong Liu, and Ji-Ye Zhao. Physical Implementation of the 1GHz Godson-3 Quad-Core Microprocessor [J]. , 2010, 25(2): 192-199.
[2] Ji-Ye Zhao, Dong Liu, Dan-Dan Huan, Meng-Hao Su, Bin Xiao, Ying Xu, Feng Shi, Chen Chen, and Song Wang. Physical Design Methodology for Godson-2G Microprocessor [J]. , 2010, 25(2): 225-231.
[3] Issam W. Damaj. Higher-Level Hardware Synthesis of the KASUMI Algorithm [J]. , 2007, 22(1): 60-70 .
[4] Hong Mei, Dong-Gang Cao, and Fu-Qing Yang. Development of Software Engineering: A Research Perspective [J]. , 2006, 21(5): 682-696 .
[5] XU Xiaofei; YE Dan; LI Quanlong; ZHAN Dechen;. Dynamic Organization and Methodology for Agile Virtual Enterprises [J]. , 2000, 15(4): 368-375.
[6] XU Xiaofei(徐晓飞),YE Dan(叶丹)and LI Quanlong(李全龙). Dynamic Organization and Methodology for Agile Virtual Enterproses [J]. , 2000, 15(4): 0-0.
[7] LUAN Shangmin; LI wei;. An Incremental Approach toAutomatic Algorithm Design [J]. , 1999, 14(4): 314-319.
[8] Ying Mingsheng;. Putting Consistent Theories Together in Institutions [J]. , 1995, 10(3): 260-266.
[9] Ying Mingsheng;. Institutions of Variable Truth Values:An Approach in the Ordered Style [J]. , 1995, 10(3): 267-273.
[10] Lu Jian; Xu Jiafu;. Design Rationale for a Wide Spectrum Specification Language FGSPEC [J]. , 1993, 8(2): 42-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved