›› 2010, Vol. 25 ›› Issue (3): 444-457.

• Special Section on Trends Changing Data Management • Previous Articles     Next Articles

Context-Sensitive Document Ranking

Li-Jun Chang (常利军), Jeffrey Xu Yu (于旭), and Lu Qin (秦璐)   

  1. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
  • Received:2009-11-03 Revised:2009-12-12 Online:2010-05-05 Published:2010-05-05
  • About author:
    Li-Jun Chang received the B.S. degree from Renmin University of China in 2007. He is currently a Ph.D. candidate in the Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong, China. His research interests include uncertain data management and keyword search in databases.
    Jeffrey Xu Yu received his B.E., M.E. and Ph.D. degrees in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Dr. Yu held teaching positions in the Institute of Information Sciences and Electronics, University of Tsukuba, Japan, and the Department of Computer Science, The Australian National University. Currently, he is a professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. Dr. Yu served as an associate editor of IEEE Transactions on Knowledge and Data Engineering, and is serving as a VLDB Journal editorial board member, and an ACM SIGMOD Information Director. His current main research interest includes graph database, graph mining, keyword search in relational databases, XML database, and Web-technology. He has published over 190 papers including the papers published in reputed journals and major international conferences.
    Lu Qin received the B.S. degree from Renmin University of China in 2006. He is currently a Ph.D. candidate in the Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong, China. His research interests include keyword search in relational databases, keyword search in large graphs and multi-query optimization in RDBMSs.
  • Supported by:

    This work was supported by the Research Grants Council of the Hong Kong SAR, China, under Grant Nos. 419008 and 419109.

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides additional information about the application domain where the documents are to be searched and ranked. The work is motivated by the fact that additional information associated with the documents can possibly assist users to find more relevant documents when they are unable to find the needed documents from the documents alone. In this paper, a context is a multi-attribute graph, which can represent any information maintained in a relational database, where multi-attribute nodes represent tuples, and edges represent primary key and foreign key references among nodes. The context-sensitive ranking is related to several research issues, how to score documents, how to evaluate the additional information obtained in the context that may contribute to the document ranking, how to rank the documents by combining the scores/costs from the documents and the context. More importantly, the relationships between documents and the information stored in a relational database may be uncertain, because they are from different data sources and the relationships are determined systematically using similarity match which causes uncertainty. In this paper, we concentrate ourselves on these research issues, and provide our solution on how to rank the documents in a context where there exist uncertainty between the documents and the context. We confirm the effectiveness of our approaches by conducting extensive experimental studies using real datasets. We present our findings in this paper.


[1] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. 1st Edition, Addison Wesley, May 1999.

[2] Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H. Bidirectional expansion for keyword search on graph databases. In Proc. VLDB2005, Trondheim, Norway, Aug. 30-Sept. 2, 2005, pp.505-516.

[3] Ding B, Yu J X,Wang S, Qin L, Zhang X, Lin X. Finding topk min-cost connected trees in databases. In Proc. ICDE 2007, Istanbul, Turkey, April 15-20, 2007, pp.836-845.

[4] Ananthakrishna R, Chaudhuri S, Ganti V. Eliminating fuzzy duplicates in data warehouses. In Proc. VLDB2002, Hong Kong, China, Aug. 20-23, 2002, pp.586-597.

[5] Li W S, Candan K S, Vu Q, Agrawal D. Retrieving and organizing web pages by \information unit". In Proc. WWW2001, Hong Kong, China, May 1-5, 2001, pp.230-244.

[6] Luo Y, Lin X,Wang W, Zhou X. Spark: Top-k keyword query in relational databases. In Proc. SIGMOD2007, Beijing, China, June 11-14, 2007, pp.115-126.

[7] He H, Wang H, Yang J, Yu P S. Blinks: Ranked keyword searches on graphs. In Proc. SIGMOD2007, Beijing, China, June 11-14, 2007, pp.305-316.

[8] Soliman M A, Ilyas I F, Chang K C C. Top-k query processing in uncertain databases. In Proc. ICDE 2007, Istanbul, Turkey, April 15-20, 2007, pp.896-905.

[9] Yi K, Li F, Kollios G, Srivastava D. Efficient processing of top-k queries in uncertain databases with x-Relations. IEEE Trans. Knowl. Data Eng., 2008, 20(12): 1669-1682.

[10] Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: A probabilistic threshold approach. In Proc. SIGMOD2008, Vancouver, Canada, June 9-12, 2008, pp.673-686.

[11] Agrawal P, Benjelloun O, Sarma A D, Hayworth C, Nabar S U, Sugihara T, Widom J. Trio: A system for data, uncertainty, and lineage. In Proc. VLDB2006, Seoul, Korea, Sept. 12-15, 2006, pp.1151-1154.

[12] Jin C, Yi K, Chen L, Yu J X, Lin X. Sliding-window top-k queries on uncertain streams. In Proc. VLDB2008, Auckland, New Zealand, Aug. 23-28, 2008, pp.301-312.

[13] Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In Proc. ICDE 2009, Shanghai, China, March 29-April 2, 2009, pp.305-306.

[14] Li J, Deshpande A. Consensus answers for queries over probabilistic databases. In Proc. PODS2009, Providence, USA, June 29-July 1, 2009, pp.259-268.

[15] Qin L, Yu J X, Chang L, Tao Y. Querying communities in relational databases. In Proc. ICDE 2009, Shanghai, China, Mar. 29-Apr. 2, 2009, pp.724-735.

[16] Re C, Dalvi N N, Suciu D. Efficient top-k query evaluation on probabilistic data. In Proc. ICDE 2007, Istanbul, Turkey, April 15-20, 2007, pp.886-895.

[17] JÄarvelin K, KekÄalÄainen J. IR evaluation methods for retrieving highly relevant documents. In Proc. SIGIR 2000, Athens, Greece, July 24-28, 2000, pp.41-48.

[18] Burges C J C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G N. Learning to rank using gradient descent. In Proc. ICML 2005, Bonn, Germany, Aug. 7-11, 2005, pp.89-96.

[19] Geng X, Liu T Y, Qin T, Arnold A, Li H, Shum H Y. Query dependent ranking using k-nearest neighbor. In Proc. SIGIR 2008, Singapore, July 20-24, 2008, pp.115-122.

[20] Dou Z, Song R, Yuan X,Wen J R. Are click-through data adequate for learning web search rankings? In Proc. CIKM2008, Napa Valley, USA, Oct. 26-30, 2008, pp.73-82.

[21] Chaudhuri S, Ganjam K, Ganti V, Motwani R. Robust and efficient fuzzy match for online data cleaning. In Proc. SIGMOD2003, San Diego, USA, June 9-12, 2003, pp.313-324.

[22] Hernandez M A, Stolfo S J. The merge/purge problem for large databases. In Proc. SIGMOD1995, San Jose, USA, May 22-25, 1995, pp.127-138.

[23] Cohen W W, Ravikumar P, Fienberg S E. A comparison of string distance metrics for name-matching tasks. In Proc. IIWeb 2003, Acapulco, Mexico, Aug. 9-10, 2003, pp.73-78.

[24] Ukkonen E. On approximate string matching. In Proc. FCT, Borgholm, Sweden, Aug. 21-27, 1983, pp.487-495.

[25] Gravano L, Ipeirotis P G, Jagadish H V, Koudas N, Muthukrishnan S, Srivastava D. Approximate string joins in a database (almost) for free. In Proc. VLDB2001, Rome, Italy, Sept. 1114, 2001, pp.491-500.

[26] Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search. In Proc. WWW2007, Banff, Canada, May 8-12, 2007, pp.131-140.

[27] Xiao C, Wang W, Lin X, Yu J X. Efficient similarity joins for near duplicate detection. In Proc. WWW2008, Beijing, China, Apr. 21-25, 2008, pp.131-140.

[28] Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S. Keyword searching and browsing in databases using banks. In Proc. ICDE 2002, San Jose, USA, Feb. 26-Mar. 1, 2002, pp.431-440.

[29] Hristidis V, Papakonstantinou Y. Discover: Keyword search in relational databases. In Proc. VLDB2002, Hong Kong, China, Aug. 20-23, 2002, pp.670-681.

[30] Hristidis V, Gravano L, Papakonstantinou Y. Efficient IRstyle keyword search over relational databases. In Proc. VLDB2003, Berlin, Germany, Sept. 9-12, 2003, pp.850-861.

[31] Agrawal S, Chaudhuri S, Das G. Dbxplorer: A system for keyword-based search over relational databases. In Proc. ICDE 2002, San Jose, USA, Feb. 26-Mar. 1, 2002, p.5.

[32] Liu F, Yu C T, Meng W, Chowdhury A. Effective keyword search in relational databases. In Proc. SIGMOD2006, Chicago, USA, June 27-29, 2006, pp.563-574.

[33] Golenberg K, Kimelfeld B, Sagiv Y. Keyword proximity search in complex data graphs. In Proc. SIGMOD2008, Vancouver, Canada, 2008, pp.927-940.

[34] Li G, Ooi B C, Feng J, Wang J, Zhou L. Ease: An effective 3in-1 keyword search method for unstructured, semi-structured and structured data. In Proc. SIGMOD2008, Vancouver, Canada, June 9-12, 2008, pp.903-914.

[35] Sayyadian M, LeKhac H, Doan A, Gravano L. Efficient keyword search across heterogeneous relational databases. In Proc. ICDE 2007, Istanbul, Turkey, April 15-20, 2007, pp.346-355.

[36] Sarma A D, Benjelloun O, Halevy A Y, Widom J. Working models for uncertain data. In Proc. ICDE 2006, Atlanta, USA, April 3-8, 2006, p.7.

[37] Benjelloun O, Sarma A D, Halevy A Y, Widom J. Uldbs: Databases with uncertainty and lineage. In Proc. VLDB2006, Seoul, Korea, Sept. 12-15, 2006, pp.953-964.

[38] Dalvi N N, Suciu D. Management of probabilistic data: Foundations and challenges. In Proc. PODS2007, Beijing, China, June 11-13, 2007, pp.1-12.

[39] Dalvi N N, Suciu D. Efficient query evaluation on probabilistic databases. VLDB J., 2007, 16(4): 523-544.

[40] Ljosa V, Singh A K. Top-k spatial joins of probabilistic objects. In Proc. ICDE 2008, Cancun, Mexico, Apr. 7-12, 2008, pp.566-575.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved