We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Carlo Batini, Paola Bonizzoni, Marco Comerio, Riccardo Dondi, Yuri Pirola, Francesco Salandra. A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas[J]. Journal of Computer Science and Technology, 2015, 30(1): 214-224. DOI: 10.1007/s11390-015-1514-5
Citation: Carlo Batini, Paola Bonizzoni, Marco Comerio, Riccardo Dondi, Yuri Pirola, Francesco Salandra. A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas[J]. Journal of Computer Science and Technology, 2015, 30(1): 214-224. DOI: 10.1007/s11390-015-1514-5

A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas

Funds: The work was partially supported by the Italian Project PON01 00861 SMART (Services and Meta-services for smART eGovernment) and by the Project (CUP E41l13000220009) SPAC3 (Smart services of the new Public Administration for the Citizen-Centricity in the Cloud) co-financed by the Lombardy region.
More Information
  • Author Bio:

    Carlo Batini is a full professor of computer engineering at the Department of Informatics, Systems and Communication (DISCo) of the University of Milano-Bicocca. He received his M.S. degree in engineering from the University of Roma. From 1983 to 1986, he was an associate professor, and from 1986 to 2001, he was a full professor at University of Roma La Sapienza. Since 2001 he has been a full professor at University of Milano-Bicocca. From 1993 to August 2003, he was on leave from university, being a member of the executive board of the Italian Authority for Information Technology in Public Administration, where he leaded significant projects in Italian Central Public Administration related to e-Government initiatives. His research interests include cooperative information systems, information systems, database modeling and design, data and information quality, web services design, eGovernment planning methodologies, and services and data repositories.

  • Received Date: December 18, 2013
  • Revised Date: July 28, 2014
  • Published Date: January 04, 2015
  • When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an efficient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t. the ones manually defined by an expert.
  • [1]
    Batini C, Lenzerini M, Navathe S B. A comparative analysis of methodologies for database schema integration. ACM Comput. Surv., 1986, 18(4): 323-364.
    [2]
    Spaccapietra S, Parent C, Dupont Y. Model independent assertions for integration of heterogeneous schemas. The VLDB J., 1992, 1(1): 81-126.
    [3]
    Spaccapietra S, Parent C. View integration: A step forward in solving structural conflicts. IEEE Trans. Knowl. Data Eng., 1994, 6(2): 258-274.
    [4]
    Yang X, Procopiuc C, Srivastava D. Summarizing relational databases. Proc. VLDB Endowment, 2009, 2(1): 634-645.
    [5]
    Wang X, Zhou X,Wang S. Summarizing large-scale database schema using community detection. J. Comput. Sci. Technol., 2012, 27(3): 515-526.
    [6]
    Yasir A, Kumara Swamy M, Krishna Reddy P. Exploiting schema and documentation for summarizing relational databases. In Proc. the 1st Int. Conf. Big Data Analytics, Dec. 2012, pp.77-90.
    [7]
    Algergawy A, Schallehn E, Saake G. A schema matchingbased approach to XML schema clustering. In Proc. the 10th Int. Conf. Information Integration and Web-Based Applications Services, Nov. 2008, pp.131-136.
    [8]
    Lee M L, Yang L H, Hsu W, Yang X. XClust: Clustering XML schemas for effective integration. In Proc. the 11th CIKM, Nov. 2002, pp.292-299.
    [9]
    Batini C, Ceri S, Navathe S B. Conceptual Database Design: An Entity-Relationship Approach (1st edition). Benjamin/ Cummings Publishing Co., 1992.
    [10]
    Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Comput. Surv., 1999, 31(3): 264-323.
    [11]
    Moody D L, Flitman A R. A decomposition method for entity relationship models: A systems theoretic approach. In Proc. the 1st Int. Conf. Systems Thinking in Management, Nov. 2000, pp.462-469.
    [12]
    Batini C, Di Battista G, Santucci G. Structuring primitives for a dictionary of entity relationship data schemas. IEEE Trans. Software Engineering, 1993, 19(4): 344-365.
    [13]
    Smith K, Mork P, Seligman L et al. The role of schema matching in large enterprises. In Proc. the 4th Biennial Conf. Innovative Data Systems Research, Jan. 2009.
    [14]
    Nayak R, Iryadi W. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems, 2007, 20(4): 336-349.
    [15]
    Banek M, Vrdoljak B, Min Tjoa A, Skocir Z. Automated integration of heterogeneous data warehouse schemas. Int. J. Data Warehousing and Mining, 2008, 4(4): 1-21.
    [16]
    Guerra F, Olaru M O, Vincini M. Mapping and integration of dimensional attributes using clustering techniques. In Proc. the 13th Int. Conf. E-Commerce and Web Technologies, Sept. 2012, pp.38-49.
    [17]
    Mahmoud H A, Aboulnaga A. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In Proc. Int. Conf. Management of Data, Jun. 2010, pp.411-422.
    [18]
    Otham R, Deris S, Illias R, Zakaria Z, Mohamed S. Automatic clustering of gene ontology by genetic algorithm. Int. J. Information Technology, 2006, 3(1): 37-46.
    [19]
    Hu W, Qu Y, Cheng G. Matching large ontologies: A divide-and-conquer approach. Data & Knowledge Engineering, 2008, 67(1): 140-160.
    [20]
    Zhao Y, Karypis G, Fayyad U. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 2005, 10(2): 141-168.
    [21]
    Bansal N, Blum A, Chawla S. Correlation clustering. Machine Learning, 2004, 56(1/2/3): 89-113.
    [22]
    Bonizzoni P, Della Vedova G, Dondi R, Jiang T. On the approximation of correlation clustering and consensus clustering. J. Comput. Syst. Sci., 2008, 74(5): 671-696.
    [23]
    Charikar M, Guruswami V, Wirth A. Clustering with qualitative information. J. Comput. Syst. Sci., 2005, 71(3): 360-383.
    [24]
    Demaine E, Emanuel D, Fiat A, Immorlica N. Correlation clustering in general weighted graphs. Theoretical Computer Science, 2006, 361(2): 172-187.
    [25]
    Papadimitriou C, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. Dover Publications, 1998.
    [26]
    Ausiello G, Crescenzi P, Gambosi G, Kann V, MarchettiSpaccamela A, Protasi M. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (1st edition). Springer-Verlag, 1999.
    [27]
    Batini C, Comerio M, Viscusi G. Managing quality of large set of conceptual schemas in public administration: Methods and experiences. In Proc. the 2nd Int. Conf. Model and Data Engineering, Oct. 2012, pp.31-42.
  • Related Articles

    [1]Jing Zhou, Shan-Feng Zhu, Xiaodi Huang, Yanchun Zhang. Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning[J]. Journal of Computer Science and Technology, 2015, 30(4): 859-873. DOI: 10.1007/s11390-015-1565-7
    [2]Loris Nanni, Alessandra Lumini. Cluster-Based Nearest-Neighbour Classifier and Its Application on the Lightning Classification[J]. Journal of Computer Science and Technology, 2008, 23(4): 573-581.
    [3]Haixun Wang, Jian Pei. Clustering by Pattern Similarity[J]. Journal of Computer Science and Technology, 2008, 23(4): 481-496.
    [4]Yu-Bao Liu, Jia-Rong Cai, Jian Yin, Ada Wai-Chee Fu. Clustering Text Data Streams[J]. Journal of Computer Science and Technology, 2008, 23(1): 112-128.
    [5]Juan J. Cuadrado Gallego, Daniel Rodri guez, Miguel Angel Sicilia, Miguel Garre Rubio, Angel Garci a Crespo. Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering[J]. Journal of Computer Science and Technology, 2007, 22(3): 371-378.
    [6]Bo Yang, Da-You Liu. A Heuristic Clustering Algorithm for Mining Communities in Signed Networks[J]. Journal of Computer Science and Technology, 2007, 22(2): 320-328.
    [7]QIAN WeiNing, GONG XueQing, ZHOU AoYing. Clustering in Very Large Databases Based on Distance and Density[J]. Journal of Computer Science and Technology, 2003, 18(1).
    [8]ZHOU Aoying, QIAN Weining, QIAN Hailei. Clustering DTDs: An Interactive Two-Level Approach[J]. Journal of Computer Science and Technology, 2002, 17(6).
    [9]HE Zengyou, XU Xiaofei, DENG Shengchun. Squeezer: An Efficient Algorithm for Clustering Categorical Data[J]. Journal of Computer Science and Technology, 2002, 17(5).
    [10]ZHUANG Yueting, RUI Yong, Thomas S.Huang. Video Key Frame Extraction by Unsupervised Clustering and Feedback Adjustment[J]. Journal of Computer Science and Technology, 1999, 14(3): 283-287.

Catalog

    Article views (20) PDF downloads (1500) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return