›› 2015, Vol. 30 ›› Issue (4): 888-901.doi: 10.1007/s11390-015-1567-5

Special Issue: Data Management and Data Mining

• Special Section on Data Management and Data Mining • Previous Articles     Next Articles

Search Result Diversification Based on Query Facets

Sha Hu(胡莎), Zhi-Cheng Dou*(窦志成), Member, CCF, ACM, IEEE, Xiao-Jie Wang(王晓捷), Ji-Rong Wen(文继荣), Senior Member, CCF, ACM, IEEE   

  1. School of Information, Renmin University of China, Beijing 100872, China; Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing 100872, China
  • Received:2015-02-01 Revised:2015-04-24 Online:2015-07-05 Published:2015-07-05
  • Contact: Zhi-Cheng Dou is an associate professor in the School of Information and Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Beijing. E-mail:dou@ruc.edu.cn
  • About author:Sha Hu is a Ph.D. student of computer science at Renmin University of China, Beijing. She received her Bachelor's degree in computer science from Renmin University of China in 2008. She worked at Microsoft Research Asia as a research intern in the Web Search and Mining Group from 2008 to 2013. Her research focuses on information retrieval and Web data extraction.
  • Supported by:

    This work was partially supported by the National Basic Research 973 Program of China under Grant No. 2014CB340403, the Fundamental Research Funds for the Central Universities of China, and the Research Funds of Renmin University of China under Grant No. 15XNLF03.

In search engines, by issuing the same query, different users may search for different information. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.

[1] Jansen B J, Spink A, Saracevic T. Real life, real users, and real needs: A study and analysis of user queries on the web. Inf. Process. Manage., 2000, 36(2): 207–227.

[2] Silverstein C, Marais H, Henzinger M, Moricz M. Analysis of a very large web search engine query log. SIGIR Forum, 1999, 33(1): 6–12.

[3] Dou Z, Song R, Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th WWW, May 2007, pp.581–590.

[4] Rafiei D, Bharat K, Shukla A. Diversifying web search results. In Proc. the 19th WWW, April 2010, pp.781–790.

[5] Clarke C L A, Craswell N, Soboroff I. Overview of the TREC 2009 web track. In Proc. the 18th TREC, November 2009.

[6] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st SIGIR, August 1998, pp.335–336.

[7] Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In Proc. the 2nd WSDM, February 2009, pp.5–14.

[8] Santos R L T, Macdonald C, Ounis I. Exploiting query reformulations for web search result diversification. In Proc. the 19th WWW, April 2010, pp.881-890.

[9] Dou Z, Hu S, Chen K, Song R, Wen J R. Multi-dimensional search result diversification. In Proc. the 4th WSDM, February 2011, pp.475–484.

[10] Dang V, CroftWB. Term level search result diversification. In Proc. the 36th SIGIR, July 28–August 1, 2013, pp.603– 612.

[11] Dou Z, Hu S, Luo Y, Song R, Wen J R. Finding dimensions for queries. In Proc. the 20th CIKM, October 2011, pp.1311–1320.

[12] Kong W, Allan J. Extracting query facets from search results. In Proc. the 36th SIGIR, July 28–August 1, 2013, pp.93–102.

[13] Kong W, Allan J. Extending faceted search to the general web. In Proc. the 23rd CIKM, Nov. 2014, pp.839–848.

[14] Clarke C L, Kolla M, Cormack G V, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In Proc. the 31st SIGIR, July 2008, pp.659–666.

[15] Zhai C, Lafferty J. A risk minimization framework for information retrieval. Inf. Process. Manage., 2006, 42(1): 31–55.

[16] Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th SIGIR, August 2006, pp.429–436.

[17] Zhang B, Li H, Liu Y, Ji L, Xi W, Fan W, Chen Z, Ma W Y. Improving web search results using affinity graph. In Proc. the 28th SIGIR, August 2005, pp.504–511.

[18] Santos R L, Macdonald C, Ounis I. Selectively diversifying web search results. In Proc. the 19th CIKM, October 2010, pp.1179–1188.

[19] Santos R L, Macdonald C, Ounis I. Intent-aware search result diversification. In Proc. the 34th SIGIR, July 2011, pp.595–604.

[20] Yue Y, Joachims T. Predicting diverse subsets using structural SVMs. In Proc. the 25th ICML, July 2008, pp.1224– 1231.

[21] Radlinski F, Kleinberg R, Joachims T. Learning diverse rankings with multi-armed bandits. In Proc. the 25th ICML, July 2008, pp.784–791.

[22] Dang V, Croft W B. Diversity by proportionality: An election-based approach to search result diversification. In Proc. the 35th SIGIR, August 2012, pp.65–74.

[23] He J, Hollink V, de Vries A. Combining implicit and explicit topic representations for result diversification. In Proc. the 35th SIGIR, August 2012, pp.851–860.

[24] Zhu Y, Lan Y, Guo J, Cheng X, Niu S. Learning for search result diversification. In Proc. the 37th SIGIR, July 2014, pp.293–302.

[25] Yu H T, Ren F. Search result diversification via filling up multiple knapsacks. In Proc. the 23rd CIKM, November 2014, pp.609–618.

[26] Liang S, Ren Z, de Rijke M. Fusion helps diversification. In Proc. the 37th SIGIR, July 2014, pp.303–312.

[27] Lawrie D, Croft W B, Rosenberg A. Finding topic words for hierarchical summarization. In Proc. the 24th SIGIR, September 2001, pp.349–357.

[28] Hu Y, Qian Y, Li H, Jiang D, Pei J, Zheng Q. Mining query subtopics from search log data. In Proc. the 35th SIGIR, August 2012, pp.305–314.

[29] Abbassi Z, Mirrokni V S, Thakur M. Diversity maximization under matroid constraints. In Proc. the 19th SIGKDD, August 2013, pp.32–40.

[30] Bache K, Newman D, Smyth P. Text-based measures of document diversity. In Proc. the 19th SIGKDD, August 2013, pp.23–31.

[31] Jameel S, Lam W. An unsupervised topic segmentation model incorporating word order. In Proc. the 36th SIGIR, July 28–August 1, 2013, pp.203–212.

[32] Fuxman A, Tsaparas P, Achan K, Agrawal R. Using the wisdom of the crowds for keyword generation. In Proc. the 17th WWW, April 2008, pp.61–70.

[33] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval (1st edition). Cambridge University Press, 2008.

[34] Song R, Wen J R, Shi S, Xin G, Liu T Y, Qin T, Zheng X, Zhang J, Xue G R, Ma W Y. Microsoft research Asia at web track and terabyte track of TREC 2004. In Proc. the 13th TREC, November 2004.

[35] Shen D, Pan R, Sun J T, Pan J J, Wu K, Yin J, Yang Q. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. SIGKDD Explorations, 2005, 7(2): 100–110.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhang Bo; Zhang Ling;. A Relation Matrix Approach to Labelling Temporal Relations in Scheduling[J]. , 1991, 6(4): 339 -346 .
[2] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[3] Zhang Bo; Zhang Ling;. On Memory Capacity of the Probabilistic Logic Neuron Network[J]. , 1993, 8(3): 62 -66 .
[4] Zhao Zhaokeng; Dai Jun; Chen Wendan;. Automated Theorem Proving in Temporal Logic:T-Resolution[J]. , 1994, 9(1): 53 -62 .
[5] Shuai Dianxun;. Concurrent Competitive Wave Approach to Hyper-Distributed Hyper-Parallel AI Processing[J]. , 1997, 12(6): 543 -554 .
[6] ZHAN Yongzhao; SONG Snunlin; XIE Li;. Demand Priority Protocol Simulation and Evaluation[J]. , 1999, 14(6): 599 -605 .
[7] CHEN Yisong(陈毅松),LU Jian(卢坚),SUN Zhengxing(孙正兴)and ZHANG Fuyan(张福炎). Greylevel Difference Classification Algorithm in Fractal Image Compression[J]. , 2002, 17(2): 0 .
[8] Imad Jawhar and Jie Wu. QoS Support in TDMA-Based Mobile Ad Hoc Networks[J]. , 2005, 20(6): 797 -810 .
[9] Hua Li, Shui-Cheng Yan, and Li-Zhong Peng[1]. Robust Non-Frontal Face Alignment with Edge Based Texture[J]. , 2005, 20(6): 849 -854 .
[10] Kwangjin Park, Hyunseung Choo, and Chong-Sun Hwang. An Efficient Data Dissemination Scheme for Spatial Query Processing[J]. , 2007, 22(1): 131 -134 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved