›› 2016, Vol. 31 ›› Issue (4): 702-719.doi: 10.1007/s11390-016-1658-y

Special Issue: Artificial Intelligence and Pattern Recognition; Data Management and Data Mining

• Special Section on Data Management and Data Mining 2016 • Previous Articles     Next Articles

Keyword Query over Error-Tolerant Knowledge Bases

Yu-Rong Cheng1, Student Member, CCF, ACM, IEEE, Ye Yuan1,*, Member, CCF, ACM, IEEE, Jia-Yu Li2, Lei Chen3, Member, CCF, ACM, IEEE, and Guo-Ren Wang1, Member, CCF, ACM, IEEE   

  1. 1 College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;
    2 College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
    3 Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
  • Received:2016-02-29 Revised:2016-05-25 Online:2016-07-05 Published:2016-07-05
  • Contact: Ye Yuan E-mail:yuanye@mail.neu.edu.cn
  • About author:Yu-Rong Cheng received her B.S. degree in computer science from the College of Computer Science and Engineering, Northeastern University, Shenyang, in 2012. Currently, she is a Ph.D. candidate of Northeastern University, Shenyang. Her main research interests include queries over graph data and large graph analysis, etc.
  • Supported by:

    Yu-Rong Cheng and Guo-Ren Wang are supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61332006, 61332014, 61328202 and U1401256. Ye Yuan is supported by the NSFC under Grant No. 61572119 and the Fundamental Research Funds for the Central Universities of China under Grant Nos. N150402005 and N130504006. Lei Chen is supported by the NSFC under Grant No. 61328202.

With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users' satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users' requirement better compared with the traditional r-clique definition, and our algorithms are efficient.

[1] Kasneci G, Suchanek F M, Ifrim G, Ramanath M, Weikum G. NAGA:Searching and ranking knowledge. In Proc. the 24th International Conference on Data Engineering (ICDE), Apr. 2008, pp.953-962.

[2] Wang H, Aggarwal C C. A survey of algorithms for keyword search on graph data. In Managing and Mining Graph Data, Aggarwal C C, Wang H (eds.), Springer, 2010, pp.249-273.

[3] Yu J X, Qin L, Chang L. Keyword Search in Databases. Morgan and Claypool Publishers, 2009.

[4] Yang M, Ding B, Chaudhuri S, Chakrabarti K. Finding patterns in a knowledge base using keywords to compose table answers. Proceedings of the VLDB Endowment, 2014, 7(14):1809-1820.

[5] Kargar M, An A. Keyword search in graphs:Finding r-cliques. Proceedings of the VLDB Endowment, 2011, 4(10):681-692.

[6] Li G, Ooi B C, Feng J, Wang J, Zhou L. EASE:An effective 3-in-1 keyword search method for unstructured, semistructured and structured data. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.903-914.

[7] Lian X, Chen L, Huang Z. Keyword search over probabilistic RDF graphs. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5):1246-1260.

[8] Zou L, Huang R, Wang H, Yu J X, He W, Zhao D. Natural language question answering over RDF:A graph data driven approach. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.313-324.

[9] Zheng W, Zou L, Lian X, Yu J X, Song S, Zhao D. How to build templates for RDF question/answering-An uncertain graph similarity join approach. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.1809-1824.

[10] Zhang L, Tran T, Rettinger A. Probabilistic query rewriting for efficient and effective keyword search on graph data. Proceedings of the VLDB Endowment, 2013, 6(14):1642-1653.

[11] Hristidis V, Gravano L, Papakonstantinou Y. Efficient IR-style keyword search over relational databases. In Proc. the 29th VLDB, Sept. 2003, pp.850-861.

[12] Liu F, Yu C, Meng W, Chowdhury A. Effective keyword search in relational databases. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, Jun. 2006, pp.563-574.

[13] Luo Y, Lin X, Wang W, Zhou X. Spark:Top-k keyword query in relational databases. In Proc. the 27th ACMSIGMOD International Conference on Management of Data, Jun. 2007, pp.115-126.

[14] Cohen S, Mamou J, Kanza Y, Sagiv Y. XSearch:A semantic search engine for XML. In Proc. the 29th VLDB, Sept. 2003, pp.45-56.

[15] Hristidis V, Koudas N, Papakonstantinou Y, Srivastava D. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(4):525-539.

[16] Liu Z, Chen Y. Identifying meaningful return information for XML keyword search. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2007, pp.329-340.

[17] Xu Y, Papakonstantinou Y. Efficient keyword search for smallest LCAs in XML databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2005, pp.527-538.

[18] Tong Y, Zhang X, Cao C C, Chen L. Efficient probabilistic supergraph search over large uncertain graphs. In Proc. the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), Nov. 2014, pp.809-818.

[19] Tong Y, She J, Meng R. Bottleneck-aware arrangement over event-based social networks:The max-min approach. World Wide Web, 2015.

[20] Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S. Keyword searching and browsing in databases using banks. In Proc. the 18th International Conference on Data Engineering (ICDE), Feb.26-Mar.1, 2002, pp.431-440.

[21] Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H. Bidirectional expansion for keyword search on graph databases. In Proc. the 31st VLDB, Aug. 2005, pp.505-516.

[22] Yuan Y, Wang G, Chen L, Wang H. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12):2767-2779.

[23] Wu Y, Yang S, Srivatsa M, Iyengar A, Yan X. Summarizing answer graphs induced by keyword queries. Proceedings of the VLDB Endowment, 2013, 6(14):1774-1785.

[24] Yuan Y, Chen L, Wang G. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In Proc. the 15th DASFAA, Apr. 2010, pp.155-170.

[25] Balas E, Xue J. Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring. Algorithmica, 1996, 15(5):397-412.

[26] Suchanek F M, Ifrim G, Weikum G. Combining linguistic and statistical analysis to extract relations from web documents. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2006, pp.712-717.

[27] Chang L, Yu J X, Qin L, Lin X, Liu C, Liang W. Efficiently computing k-edge connected components via graph decomposition. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2013, pp.205-216.

[28] Vazirani V V. Approximation Algorithms. Springer Berlin Heidelberg, 2003.

[29] Mitzenmacher M, Upfal E. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.

[30] Yang S, Wu Y, Sun H, Yan X. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment, 2014, 7(7):565-576.

[31] Jin R, Liu L, Ding B, Wang H. Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(9):551-562.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[2] Han Jianchao; Shi Zhongzhi;. Formalizing Default Reasoning[J]. , 1990, 5(4): 374 -378 .
[3] Zhang Bo; Zhang Ling;. On Memory Capacity of the Probabilistic Logic Neuron Network[J]. , 1993, 8(3): 62 -66 .
[4] Luo Junzhou; Gu Guanqun;. CIMS Network Protocol and Its Net Models[J]. , 1997, 12(5): 476 -481 .
[5] WAN Huagen; JIN Xiaogang; BAO Hujun;. Direct 3D Painting with a Metaball-Based Paint brush[J]. , 2000, 15(1): 100 -104 .
[6] ZHANG Wensong; JIN Shiyao; WU Quanyuan;. LinuxDirector: A Connection Director for Scalable Internet Services[J]. , 2000, 15(6): 560 -571 .
[7] Zhong-Xuan Liu, Shi-Guo Lian, and Zhen Ren. Quaternion Diffusion for Color Image Filtering[J]. , 2006, 21(1): 126 -136 .
[8] Zhi-Hua Zhou. Multi-Instance Learning from Supervised View[J]. , 2006, 21(5): 800 -809 .
[9] Han-Bing Yan, Shi-Min Hu, and Ralph R Martin. 3D Morphing Using Strain Field Interpolation[J]. , 2007, 22(1): 147 -155 .
[10] Juan J. Cuadrado Gallego, Daniel Rodri guez, Miguel Angel Sicilia, Miguel Garre Rubio and Angel Garci a Crespo. Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering[J]. , 2007, 22(3): 371 -378 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved