|
›› 2015,Vol. 30 ›› Issue (1): 214-224.doi: 10.1007/s11390-015-1514-5
所属专题: Data Management and Data Mining
• Special Section on Selected Paper from NPC 2011 • 上一篇
Carlo Batini1, Paola Bonizzoni1, Marco Comerio1, Riccardo Dondi2, Yuri Pirola1, Francesco Salandra1
Carlo Batini1, Paola Bonizzoni1, Marco Comerio1, Riccardo Dondi2, Yuri Pirola1, Francesco Salandra1
当整合过程涉及上百个甚至上千个模式时,需要相应的标准以选择要集成的模式簇,以便通过有效的迭代过程处理整合问题.簇里的模式应该是按照基于模式间相似度和相异性的内聚和耦合标准而挑选的.本文为关联聚类方法的变形提出了一种算法,以协助设计者集成大量的概念模式.该变形引入了每个簇中模式数量的上下限,以分别避免过于复杂和过于简单的整合环境.本文给出了一种解决该问题(是一个NP难的组合问题)的启发式规则.一项实验表明相对由专家人工定义的方法,使用该算法得到簇的模式集成过程的有效性得到了明显增加.
[1] Batini C, Lenzerini M, Navathe S B. A comparative analysis of methodologies for database schema integration. ACM Comput. Surv., 1986, 18(4): 323-364.[2] Spaccapietra S, Parent C, Dupont Y. Model independent assertions for integration of heterogeneous schemas. The VLDB J., 1992, 1(1): 81-126.[3] Spaccapietra S, Parent C. View integration: A step forward in solving structural conflicts. IEEE Trans. Knowl. Data Eng., 1994, 6(2): 258-274.[4] Yang X, Procopiuc C, Srivastava D. Summarizing relational databases. Proc. VLDB Endowment, 2009, 2(1): 634-645.[5] Wang X, Zhou X,Wang S. Summarizing large-scale database schema using community detection. J. Comput. Sci. Technol., 2012, 27(3): 515-526.[6] Yasir A, Kumara Swamy M, Krishna Reddy P. Exploiting schema and documentation for summarizing relational databases. In Proc. the 1st Int. Conf. Big Data Analytics, Dec. 2012, pp.77-90.[7] Algergawy A, Schallehn E, Saake G. A schema matchingbased approach to XML schema clustering. In Proc. the 10th Int. Conf. Information Integration and Web-Based Applications Services, Nov. 2008, pp.131-136.[8] Lee M L, Yang L H, Hsu W, Yang X. XClust: Clustering XML schemas for effective integration. In Proc. the 11th CIKM, Nov. 2002, pp.292-299.[9] Batini C, Ceri S, Navathe S B. Conceptual Database Design: An Entity-Relationship Approach (1st edition). Benjamin/ Cummings Publishing Co., 1992.[10] Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Comput. Surv., 1999, 31(3): 264-323.[11] Moody D L, Flitman A R. A decomposition method for entity relationship models: A systems theoretic approach. In Proc. the 1st Int. Conf. Systems Thinking in Management, Nov. 2000, pp.462-469.[12] Batini C, Di Battista G, Santucci G. Structuring primitives for a dictionary of entity relationship data schemas. IEEE Trans. Software Engineering, 1993, 19(4): 344-365.[13] Smith K, Mork P, Seligman L et al. The role of schema matching in large enterprises. In Proc. the 4th Biennial Conf. Innovative Data Systems Research, Jan. 2009.[14] Nayak R, Iryadi W. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems, 2007, 20(4): 336-349.[15] Banek M, Vrdoljak B, Min Tjoa A, Skocir Z. Automated integration of heterogeneous data warehouse schemas. Int. J. Data Warehousing and Mining, 2008, 4(4): 1-21.[16] Guerra F, Olaru M O, Vincini M. Mapping and integration of dimensional attributes using clustering techniques. In Proc. the 13th Int. Conf. E-Commerce and Web Technologies, Sept. 2012, pp.38-49.[17] Mahmoud H A, Aboulnaga A. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In Proc. Int. Conf. Management of Data, Jun. 2010, pp.411-422.[18] Otham R, Deris S, Illias R, Zakaria Z, Mohamed S. Automatic clustering of gene ontology by genetic algorithm. Int. J. Information Technology, 2006, 3(1): 37-46.[19] Hu W, Qu Y, Cheng G. Matching large ontologies: A divide-and-conquer approach. Data & Knowledge Engineering, 2008, 67(1): 140-160.[20] Zhao Y, Karypis G, Fayyad U. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 2005, 10(2): 141-168.[21] Bansal N, Blum A, Chawla S. Correlation clustering. Machine Learning, 2004, 56(1/2/3): 89-113.[22] Bonizzoni P, Della Vedova G, Dondi R, Jiang T. On the approximation of correlation clustering and consensus clustering. J. Comput. Syst. Sci., 2008, 74(5): 671-696.[23] Charikar M, Guruswami V, Wirth A. Clustering with qualitative information. J. Comput. Syst. Sci., 2005, 71(3): 360-383.[24] Demaine E, Emanuel D, Fiat A, Immorlica N. Correlation clustering in general weighted graphs. Theoretical Computer Science, 2006, 361(2): 172-187.[25] Papadimitriou C, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. Dover Publications, 1998.[26] Ausiello G, Crescenzi P, Gambosi G, Kann V, MarchettiSpaccamela A, Protasi M. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (1st edition). Springer-Verlag, 1999.[27] Batini C, Comerio M, Viscusi G. Managing quality of large set of conceptual schemas in public administration: Methods and experiences. In Proc. the 2nd Int. Conf. Model and Data Engineering, Oct. 2012, pp.31-42. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |