Summarizing Large-Scale Database Schema Using Community Detection

Xue Wang; Xuan Zhou; Shan Wang

doi:10.1007/s11390-012-1240-1

Xue Wang, Xuan Zhou, Shan Wang. Summarizing Large-Scale Database Schema Using Community Detection. Journal of Computer Science and Technology, 2012, 27(3): 515-526. DOI: 10.1007/s11390-012-1240-1

Citation:

Xue Wang, Xuan Zhou, Shan Wang. Summarizing Large-Scale Database Schema Using Community Detection. Journal of Computer Science and Technology, 2012, 27(3): 515-526. DOI: 10.1007/s11390-012-1240-1

Citation:

Xue Wang, Xuan Zhou, Shan Wang. Summarizing Large-Scale Database Schema Using Community Detection. Journal of Computer Science and Technology, 2012, 27(3): 515-526. DOI: 10.1007/s11390-012-1240-1

Summarizing Large-Scale Database Schema Using Community Detection

Abstract

Abstract

Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.

FullText(HTML)

References (23)

Relative Articles

Supplements (0)

Cited By

Summarizing Large-Scale Database Schema Using Community Detection

Abstract

Catalog

Export File

Citation

Format

Content