We use cookies to improve your experience with our site.

Indexed in:

SCIE, Ei, Scopus, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Xue Wang, Xuan Zhou, Shan Wang. Summarizing Large-Scale Database Schema Using Community Detection[J]. Journal of Computer Science and Technology, 2012, 27(3): 515-526. DOI: 10.1007/s11390-012-1240-1
Citation: Xue Wang, Xuan Zhou, Shan Wang. Summarizing Large-Scale Database Schema Using Community Detection[J]. Journal of Computer Science and Technology, 2012, 27(3): 515-526. DOI: 10.1007/s11390-012-1240-1

Summarizing Large-Scale Database Schema Using Community Detection

Funds: This work is partly supported by the "HGJ" National Science and Technology Major Project of China under Grant No. 2010ZX01042-001-002, the National Natural Science Foundation of China under Grant No. 61070054, the National High Technol-ogy Research and Development 863 Program of China under Grant No. 2009AA01Z149, the Research Funds of Renmin University of China under Grant No. 10XNI018 and the Postgraduate Science & Research Funds of Renmin University of China under Grant No. 12XNH177.
More Information
  • Author Bio:

    Xue Wang received the M.S. degree in computer science from Lanzhou University in 2005. She is a Ph.D. candidate at Renmin Uni-versity of China. Her research inter-ests include databases and informa-tion retrieval.

  • Received Date: September 03, 2011
  • Revised Date: December 30, 2011
  • Published Date: May 04, 2012
  • Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.
  • [1]
    Newman M E J, Girvan M. Finding and evaluating commu-nity structure in networks. Physical Review E, 2004, 69(2):026113.
    [2]
    Newman M E J, Fast algorithm for detecting communitystructure in networks. Physical Review E, 2004, 69(6):066133.
    [3]
    Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P.Community detection in social media. Data Mining andKnowledge Discovery, 2012, 24(3): 515-554.
    [4]
    Shi J, Malik . Normalized cuts and image segmentation.IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 2000, 22(8): 888-905.
    [5]
    Luxburg U. A tutorial on spectral clustering. Statistics andComputing, 2007, 17(4): 395-416.
    [6]
    Rahn E, Bernstein P A. A survey of approaches to automaticschema matching. J. Very Large Data Base, 2001, 10(4):334-350.
    [7]
    Yang X, Procopiuc C M, Srivastava D. Summarizing rela-tional databases. PVLDB, 2009, 2(1): 634-645.
    [8]
    www.freebase.com, September 2011.
    [9]
    Wu W, Reinwald B, Sismannis Y, Manjrekar B. Discoveringtopical structures of databases. In Proc. SIGMOD2008, June2008, pp.1019-1030.
    [10]
    Dyer M E, Fireze A M. A simple heuristic for the p-centerproblem. Operations Research Letters, 1985, 3(6): 285-288.
    [11]
    Clauset A, Newman M E J, Moore C. Finding communitystructure in very large networks. Physical Review E, 2004,70(6): 066111.
    [12]
    Lancichinetti A, Fortunato S. Community detection algo-rithms: A comparative analysis. Physical Review E, 2009,80(5): 056117.
    [13]
    Campbell L J, Halpin T A, Proper H A. Conceptual schemaswith abstractions making flat conceptual schemas more com-prehensible. Data & Knowledge Engineering, 1996, 20(1):39-85.
    [14]
    Feldman P, Miller D. Entity model clustering: Structuringa data model by abstraction. The Computer Journal, 1986,29(4): 348-360.
    [15]
    Teorey T, Wei G, Bolton D, Koenig J. ER model cluster-ing as an aid for user communication and documentation indatabase design. Communications of the ACM, 1989, 32(8):975-987.
    [16]
    Huffman S B, Zoeller R V. A rule-based system tool for au-tomated ER model clustering. In Proc. the 8th InternationalConference on Entity-Relationship Approach to Database De-sign and Querying, Oct. 1990, pp.221-236.
    [17]
    Campbell L J, Halpin T A, Proper H A. CA ERwin datamodeler, www.ca.com.
    [18]
    Yu C, Jagadish H V. Schema summarization. In Proc. the32nd International Conference on Very Large Data Bases,Sep. 2006, pp.319-330.
    [19]
    Motwani R, Raghavan P. Randomized Algorithms. Cam-bridge Univ. Press, 1995.
    [20]
    Han J, Kamber M. Data Mining: Concepts and Techniques(2nd edition). Morgan Kaufmann, 2006.
    [21]
    Domingos P, Richardson M. Mining the network value of cus-tomers. In Proc. the 7th ACM SIGKDD, Aug. 2001, pp.57-66.
    [22]
    Richardson M, Domingos P. Mining knowledge-sharing sitesfor viral marketing. In Proc. the 8th ACM SIGKDD, July2002, pp.61-70.
    [23]
    Kempe D, Kleinberg J M, Tardos E. Maximizing the spreadof in?uence through a social network. In Proc. the 9th ACMSIGKDD, Aug. 2003, pp.137-146.

Catalog

    Article views (0) PDF downloads (1527) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return