We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Ruihua Song, Zhicheng Dou, Hsiao-Wuen Hon, Yong Yu. Learning Query Ambiguity Models by Using Search Logs[J]. Journal of Computer Science and Technology, 2010, 25(4): 728-738. DOI: 10.1007/s11390-010-1056-9
Citation: Ruihua Song, Zhicheng Dou, Hsiao-Wuen Hon, Yong Yu. Learning Query Ambiguity Models by Using Search Logs[J]. Journal of Computer Science and Technology, 2010, 25(4): 728-738. DOI: 10.1007/s11390-010-1056-9

Learning Query Ambiguity Models by Using Search Logs

More Information
  • Author Bio:

    Ruihua Songis a researcher in Microsoft Research Asia. She received B.E. and M.E. degrees from Department of Computer Science and Technology, Tsinghua University. Her main research interests are Web information retrieval and Web information extraction.

    Zhicheng Dou is an associate researcher in Microsoft Research Asia. He received the B.S. and Ph.D. degrees in computer science and technology from Nankai University in 2003 and 2008, respectively. His main research interests include Web information retrieval and data mining.

    Hsiao-Wuen Hon is managing director of Microsoft Research Asia. As an IEEE fellow, Dr. Hon is an internationally recognized expert in speech technology. His recent research focuses on Web information retrieval and natural language processing.

    Yong Yu is a professor in Computer Science Department of Shanghai Jiao Tong University. He got his Master’s degree from East China Normal University. His research focuses on Web search and mining, semantic Web and peer-to-peer search.

  • Received Date: May 14, 2009
  • Revised Date: February 22, 2010
  • Published Date: July 08, 2010
  • Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.
  • [1]
    Song R, Luo Z, Nie J Y, Yu Y, Hon H W. Identification of ambiguous queries in Web search. Information Processing and Management, 2008, 45(2): 216-229.
    [2]
    Dou Z, Song R, Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th International Conference on World Wide Web (WWW,2007), Banff, Canada, May 8-12, 2007, pp.581-590.
    [3]
    Sanderson M. Ambiguous queries: Test collections need more sense. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2008), Singapore, July 20-24, 2008, pp.499-506.
    [4]
    Radlinski F, Dumais S. Improving personalized Web search using result diversification. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR,2006), Seattle, USA, Aug. 6-11, 2006, pp.691-692.
    [5]
    Li Y, Zheng Z, Dai K. KDD CUP-2005 report: Facing a great challenge. SIGKDD Explor. Newsl., 2005, 7(2): pp.91-99.
    [6]
    Vapnik V N. Principles of Risk Minimization for Learning Theory. Advances in Neural Information Processing Systems 4, Morgan Kaufmann, 1992, pp.831-838.%Denver, USA, Nov. 30-Dec. 3, 1992.
    [7]
    Mihalcea R, Pedersen T. Advances in word sense disambiguation. In Tutorials at the 20th National Conference on Artificial Intelligence, Pittsburgh, USA, July 9-13, 2005.
    [8]
    Krovetz R, Croft B W. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst., 1992, 10(2): 115-141.
    [9]
    Voorhees E M. Using WordNet to disambiguate word senses for text retrieval. In Proc. the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR,1993), Pittsburgh, USA, June 27-July 1, 1993, pp.171-180.
    [10]
    Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,1998), Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.
    [11]
    Zhai C X, Cohen W W, Lafferty J. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proc. the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2003), Toronto, Canada, Jul. 28-Aug. 1, 2003, pp.10-17.
    [12]
    Zhai C X, Lafferty J. A risk minimization framework for information retrieval. Information Processing and Management, 2006, 42(1): 31-55.
    [13]
    Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\,2006), Seattle, USA, Aug. 6-11, 2006, pp.429-436.
    [14]
    Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In Proc. the Second ACM International Conference on Web Search and Data Mining (WSDM,2009), Barcelona, Spain, Feb. 9-12, 2009, pp.5-14.
    [15]
    Lee U, Liu Z, Cho J. Automatic identification of user goals in Web search. In Proc. the 14th International Conference on World Wide Web (WWW,2005), Chiba, Japan, May 10-14, 2005, pp.391-400.
    [16]
    Dai H (Kathy), Zhao L, Nie Z, Wen J R, Wang L, Li Y. Detecting online commercial intention (OCI). In Proc. the 15th International Conference on World Wide Web (WWW,2006), Edinburgh, UK, May 23-26, 2006, pp.829-837.
    [17]
    Gravano L, Hatzivassiloglou V, Lichtenstein R. Categorizing web queries according to geographical locality. In Proc. the Twelfth International Conference on Information and Knowledge Management (CIKM,2003), New Orleans, USA, Nov. 2-8, 2003, pp.325-333.
    [18]
    Platt J C. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advanced in Kernel Methods: Support Vector Learning, MIT Press, 1998.
    [19]
    Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H. Context-aware query suggestion by mining click-through and session data. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\,2008), Las Vegas, USA, Aug. 24-27, 2008, pp.875-883.
    [20]
    Shen D, Pan R, Sun J T, Pan J J, Wu K, Yin J, Yang Q. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. SIGKDD Explor. Newsl., 2005, 7(2): 100-110.
    [21]
    Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.
  • Related Articles

    [1]Jia Chen, Peng Wang, Fan Qiao, Shi-Qing Du, Wei Wang. PLQ: An Efficient Approach to Processing Pattern-Based Log Queries[J]. Journal of Computer Science and Technology, 2022, 37(5): 1239-1254. DOI: 10.1007/s11390-020-0653-5
    [2]Punit Kumar, Atul Gupta. Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey[J]. Journal of Computer Science and Technology, 2020, 35(4): 913-945. DOI: 10.1007/s11390-020-9487-4
    [3]Hua-Ming Liao, Guo-Shun Pei. Cache-Based Aggregate Query Shipping: An Efficient Scheme of Distributed OLAP Query Processing[J]. Journal of Computer Science and Technology, 2008, 23(6): 905-915.
    [4]Zhen-Hua Huang, Jian-Kui Guo, Sheng-Li Sun, Wei Wang. Efficient Optimization of Multiple Subspace Skyline Queries[J]. Journal of Computer Science and Technology, 2008, 23(1): 103-111.
    [5]Jun Zhang, Zhao-Hui Peng, Shan Wang, Hui-Jing Nie. CLASCN: Candidate Network Selection for Efficient Top-k Keyword Queries over Databases[J]. Journal of Computer Science and Technology, 2007, 22(2): 197-207.
    [6]Dun-Ren Che. Accomplishing Deterministic XML Query Optimization[J]. Journal of Computer Science and Technology, 2005, 20(3): 357-366.
    [7]Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge Discovery[J]. Journal of Computer Science and Technology, 1998, 13(4): 348-368.
    [8]Meng Xiaofeng, Wong Kam-Fai, Yip Suen Man, Vincent Lum, Wang Shan. The Processing and Improvement of Multi-Statement Queries in Chiql[J]. Journal of Computer Science and Technology, 1998, 13(2): 161-173.
    [9]Hock C. Chan. Translational Semantics for a Conceptual Level Query Language[J]. Journal of Computer Science and Technology, 1995, 10(2): 175-187.
    [10]Zhou Aoying, Shi Baile. Query Optimization for Deductive Databases[J]. Journal of Computer Science and Technology, 1995, 10(2): 134-148.

Catalog

    Article views (17) PDF downloads (2323) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return