We use cookies to improve your experience with our site.

容错知识库中的关键字查询

Keyword Query over Error-Tolerant Knowledge Bases

  • 摘要: 随着互联网提供越来越多的知识,查询和挖掘知识库吸引着研究着地兴趣。知识库通常由图来建模。在知识库的所有查询处理中,关键字查询是最为广泛应用的。虽然图数据上的关键字查询在近年来已经被广泛研究,但知识库作为一种容错的图数据,使得传统的关键字查询结果很难让用户满意。因此,本文中,我们特别为知识库定义了一种新的关键字查询,称作"confident r-clique"查询。该查询是基于目前最好的传统图数据关键字查询r-clique而定义的。本文中,我们证明,找到这些confident r-clique是#P-难的,因此,我们提出了过滤-确认的框架来提高搜索效率。在过滤步骤中,我们推导出confident r-clique的上届,冰设计了一种索引及相应的过滤算法。该索引能够很好地适应大数据环境。在确认步骤中,我们采用采样的方式从过滤步骤中选出的候选集中计算出最终的结果。我们进行了大量的实验验证了我们提出的confident r-clique定义比传统的关键字查询定义在容错知识库环境下更能满足用户需求,且我们所提出的算法具有很高的效率。

     

    Abstract: With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users' satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users' requirement better compared with the traditional r-clique definition, and our algorithms are efficient.

     

/

返回文章
返回