›› 2010, Vol. 25 ›› Issue (4): 728-738.doi: 10.1007/s11390-010-1056-9

Special Issue: Artificial Intelligence and Pattern Recognition

• Special Section on Advances in Machine Learning and Applications • Previous Articles     Next Articles

Learning Query Ambiguity Models by Using Search Logs

Ruihua Song1,2(宋睿华), Member, ACM, Zhicheng Dou2(窦志成), Hsiao-Wuen Hon2(洪小文), Fellow, IEEE and Yong Yu1(俞 勇)   

  1. 1. Department of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China
    2. Microsoft Research Asia, Beijing 100190, China
  • Received:2009-05-15 Revised:2010-02-23 Online:2010-07-09 Published:2010-07-09
  • About author:
    Ruihua Songis a researcher in Microsoft Research Asia. She received B.E. and M.E. degrees from Department of Computer Science and Technology, Tsinghua University. Her main research interests are Web information retrieval and Web information extraction.
    Zhicheng Dou is an associate researcher in Microsoft Research Asia. He received the B.S. and Ph.D. degrees in computer science and technology from Nankai University in 2003 and 2008, respectively. His main research interests include Web information retrieval and data mining.
    Hsiao-Wuen Hon is managing director of Microsoft Research Asia. As an IEEE fellow, Dr. Hon is an internationally recognized expert in speech technology. His recent research focuses on Web information retrieval and natural language processing.
    Yong Yu is a professor in Computer Science Department of Shanghai Jiao Tong University. He got his Master’s degree from East China Normal University. His research focuses on Web search and mining, semantic Web and peer-to-peer search.

Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.

[1] Song R, Luo Z, Nie J Y, Yu Y, Hon H W. Identification of ambiguous queries in Web search. Information Processing and Management, 2008, 45(2): 216-229.

[2] Dou Z, Song R, Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th International Conference on World Wide Web (WWW,2007), Banff, Canada, May 8-12, 2007, pp.581-590.

[3] Sanderson M. Ambiguous queries: Test collections need more sense. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2008), Singapore, July 20-24, 2008, pp.499-506.

[4] Radlinski F, Dumais S. Improving personalized Web search using result diversification. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR,2006), Seattle, USA, Aug. 6-11, 2006, pp.691-692.

[5] Li Y, Zheng Z, Dai K. KDD CUP-2005 report: Facing a great challenge. SIGKDD Explor. Newsl., 2005, 7(2): pp.91-99.

[6] Vapnik V N. Principles of Risk Minimization for Learning Theory. Advances in Neural Information Processing Systems 4, Morgan Kaufmann, 1992, pp.831-838.%Denver, USA, Nov. 30-Dec. 3, 1992.

[7] Mihalcea R, Pedersen T. Advances in word sense disambiguation. In Tutorials at the 20th National Conference on Artificial Intelligence, Pittsburgh, USA, July 9-13, 2005.

[8] Krovetz R, Croft B W. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst., 1992, 10(2): 115-141.

[9] Voorhees E M. Using WordNet to disambiguate word senses for text retrieval. In Proc. the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR,1993), Pittsburgh, USA, June 27-July 1, 1993, pp.171-180.

[10] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,1998), Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.

[11] Zhai C X, Cohen W W, Lafferty J. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proc. the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2003), Toronto, Canada, Jul. 28-Aug. 1, 2003, pp.10-17.

[12] Zhai C X, Lafferty J. A risk minimization framework for information retrieval. Information Processing and Management, 2006, 42(1): 31-55.

[13] Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2006), Seattle, USA, Aug. 6-11, 2006, pp.429-436.

[14] Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In Proc. the Second ACM International Conference on Web Search and Data Mining (WSDM,2009), Barcelona, Spain, Feb. 9-12, 2009, pp.5-14.

[15] Lee U, Liu Z, Cho J. Automatic identification of user goals in Web search. In Proc. the 14th International Conference on World Wide Web (WWW,2005), Chiba, Japan, May 10-14, 2005, pp.391-400.

[16] Dai H (Kathy), Zhao L, Nie Z, Wen J R, Wang L, Li Y. Detecting online commercial intention (OCI). In Proc. the 15th International Conference on World Wide Web (WWW,2006), Edinburgh, UK, May 23-26, 2006, pp.829-837.

[17] Gravano L, Hatzivassiloglou V, Lichtenstein R. Categorizing web queries according to geographical locality. In Proc. the Twelfth International Conference on Information and Knowledge Management (CIKM,2003), New Orleans, USA, Nov. 2-8, 2003, pp.325-333.

[18] Platt J C. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advanced in Kernel Methods: Support Vector Learning, MIT Press, 1998.

[19] Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H. Context-aware query suggestion by mining click-through and session data. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\,2008), Las Vegas, USA, Aug. 24-27, 2008, pp.875-883.

[20] Shen D, Pan R, Sun J T, Pan J J, Wu K, Yin J, Yang Q. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. SIGKDD Explor. Newsl., 2005, 7(2): 100-110.

[21] Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.

No related articles found!
Full text



No Suggested Reading articles found!

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved