• Articles • Previous Articles     Next Articles

An Adaptive Approach to Schema Classification for Data Warehouse Modeling

Hong-Ding Wang{1,2, Yun-Hai Tong{1,2, Shao-Hua Tan{1,2, Shi-Wei Tang{1,2, Dong-Qing Yang1, and Guo-Hui Sun3   

  1. 1School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China 2National Laboratory on Machine Perception, Peking University, Beijing 100871, China 3Microsoft (China) Co., Ltd, Beijing 100027, China
  • Received:2006-05-01 Revised:2007-01-07 Online:2007-03-10 Published:2007-03-10

Data warehouse (DW) modeling is a complicated task, involving both knowledge of business processes and familiarity with operational information systems structure and behavior. Existing DW modeling techniques suffer from the following major drawbacks --- data-driven approach requires high levels of expertise and neglects the requirements of end users, while demand-driven approach lacks enterprise-wide vision and is regardless of existing models of underlying operational systems. In order to make up for those shortcomings, a method of classification of schema elements for DW modeling is proposed in this paper. We first put forward the vector space models for subjects and schema elements, then present an adaptive approach with self-tuning theory to construct context vectors of subjects, and finally classify the source schema elements into different subjects of the DW automatically. Benefited from the result of the schema elements classification, designers can model and construct a DW more easily.

Key words: Satisfiability problem; propositional formula; algorithm optimization;

[1] Inmon W H. Building the Data Warehouse. 3rd Edition, John Wiley \& Sons, 2002.

[2] Jukic N. Modeling strategies and alternatives for data warehousing projects. -\it Communications of the ACM}, 2006, 49(4): 83--88.

[3] Dori D, Reldman R, Sturm A. Transforming an operational system model to a data warehouse model: A survey of techniques. In -\it Proc. IEEE Int. Conf. Software---Science, Technology and Engineering}, Herzelia, Israel, 2005, pp.47--56.

[4] Winter R, Strauch B. A method for demand-driven information requirements analysis in data warehousing projects. In -\it Proc. Hawaii Int. Conf. System Sciences}, Big Island, HI, USA, 2003, pp.231.

[5] Husemann B, Lechtenborger J, Vossen G. Conceptual data warehouse modeling. In -\it Proc. Int. Workshop on Design and Management of Data Warehouses}, Stockholm, Sweden, 2000, pp.6.1--6.11.

[6] Golfarelli M, Maio D, Rizzi S. The dimensional fact model: A conceptual model for data warehouses. -\it Int. J. Cooperative Information Systems}, 1998, 7(2-3): 215--247.

[7] Moody D, Kortink M. From enterprise models to dimensional models: A methodology for data warehouse and data mart design. In -\it Proc. Int. Workshop on Design and Management of Data Warehouses}, Stockholm, Sweden, 2000, pp.5.1--5.12.

[8] Bruckner R M, List B, Schiefer J. Developing requirements for data warehouse systems with use cases. In -\it Proc. The Annual Americas' Conf. Information System}, Boston, Massachusetts, USA, 2001, pp.329--335.

[9] Prakash N, Gosain A. Requirements driven data warehouse development. In -\it Proc. Int. Conf. Advanced Information Systems Engineering}, Klagenfurt, Austria, 2003, pp.13--16.

[10] Giorgini P, Rizzi S, Garzetti M. Goal-oriented requirement analysis for data warehouse design. In -\it Proc. Int. Workshop on Data Warehousing and OLAP}, Bremen, Germany, 2005, pp.47--56.

[11] Lujan-Mora S, Trujillo J. A comprehensive method for data warehouse design. In -\it Proc. Int. Workshop on Design and Management of Data Warehouses}, Berlin, Germany, 2003, pp.1.1--1.13.

[12] Kimball R, Reeves L, Ross M \it et al. \rm The Data Warehouse Lifecycle Toolkit. 2nd Edition, John Wiley \& Sons, 2002, pp.16--24.

[13] Breslin M. Data warehousing battle of the giants: Comparing the basics of the Kimball and Inmon models. -\it Business Intelligence Journal}, 2004, 9(1): 6--20.

[14] Wang H D, Yu B, Tang S W \it et al. \rm An Effective Approach to Design Data warehouse. -\it Computer Engineer and Applications}, 2004, 40(9): 1--2. (in Chinese)

[15] Rahm E, Do H H. Data cleaning: Problems and current approaches. -\it IEEE Data Eng. Bulletin}, 2000, 23(4): 3--13.

[16] Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual Modeling for ETL Processes. -\it In Proc. Int. Workshop on Data Warehousing and OLAP}, McLean, VA, USA, 2002, pp.14--21.

[17] B\"ohnlein M, Ende A U. Deriving initial data warehouse structures from the conceptual data models of the underlying operational information systems. In -\it Proc. Int. Workshop on Data Warehousing and OLAP}, Kansas City, MO, USA, 1999, pp.15--21.

[18] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. -\it International Journal of Information Processing and Management}, 1988, 24(5): 513--523.

[19] Salton G. Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer. Addison-Wesley, 1989.

[20] Castano S, De Antonellis V, De Capitani di Vemercati S. Global viewing of heterogeneous data sources. -\it IEEE Trans. Knowl. Data Eng}., 2001, 13(2): 277--297.

[21] WordReference.com. English dictionary. http://www.wordrefe\-rence.com/definition.

[22] Widdows D. A mathematical model for context and word-meaning. In -\it Proc. Int. and Interdisciplinary Conf. Modeling and Using Context}, Stanford, CA, USA, 2003, pp.369--382.

[23] Schutze H. Automatic word sense discrimination. -\it Computational Linguistics}. 1998, 24(1): 97--124.

[24] Jing H Y, Tzoukermann E. Information retrieval based on context distance and morphology. In -\it Proc. Annual Int. ACM SIGIR Conf.}, Berkeley, CA, USA, 1999, pp.90--96.

[25] Wu L, Faloutsos C, Sycara K \it et al. \rm Falcon: Feedback adaptive loop for content-based retrieval. In -\it Proc. VLDB}, Cairo, Egypt, 2000, pp.297--306.

[26] Rui Y, Huang T %Ortega M \it et al. \rm Relevance feedback: A power tool for interactive content-based image retrieval. -\it IEEE Tran. Circuits and Systems for Video Technology}, 1998, 8(5): 644--655.

[27] Kim D H, Chung C W. Qcluster: Relevance feedback using adaptive clustering for content-based image retrieval. In -\it Proc. ACM SIGMOD,} San Diego, CA, USA, 2003, pp.599--610.

[28] Pottinger R A, Bernstein P A. Merging models based on given correspondences. In -\it Proc. VLDB}, Berlin, Germany, 2003, pp.826--873.

[29] Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. -\it The VLDB Journal}, 2001, 10(4): 334--350.

[30] Doan A, Halevy A Y. Semantic integration research in the database community: A brief survey. -\it AI Magazine}, 2005, 26(1): 83--94.

[31] Giunchiglia F, Shvaiko P, Yatskevich M. Semantic schema matching. In -\it Proc. Int. Conf. Cooperative Information Systems}, Agia Napa, Cyprus, 2005, pp.347--365.

[32] Bilke A, Naumann F. Schema matching using duplicates. In -\it Proc. ICDE}, Tokyo, Japan, 2005, pp.69--81.

[33] Aumuller D, Do H H, Massmann S \it et al. \rm Schema and ontology matching with COMA$-++}$. In -\it Proc. ACM SIGMOD}, Baltimore, MD, USA, 2005, pp.906--908.

[34] Xu L, Embley D W. Discovering direct and indirect matches for schema elements. In -\it Proc. Int. Conf. Database Systems for Advanced Applications}, Kyoto, Japan, 2003, pp.39--46.

[35] Wang H D, Tang S W, Tong Y H \it et al. \rm An approach for identifying attribute correspondences in multilingual schemas. In -\it Proc. ACM SAC}, Dijon, France, 2006, pp.1674--1678.

[36] Wang H D, Tan S H, Tang S W \it et al. \rm Identifying indirect attribute correspondences in multilingual schemas. In -\it Proc. Int. Workshop on DEXA}, Krakow, Poland, 2006, pp.652--656.

[37] Zille H, Muhammad J N, Nadeem I. An ontology-based framework for semi-automatic schema integration. -\it J. Comput. Sci. $\&$ Technol.,} 2005, 20(6): 788--796.
[1] Xiao-Feng Tao, Yan-Zhao Hou, Kai-Dong Wang, Hai-Yang He, and Y. Jay Guo. GPP-Based Soft Base Station Designing and Optimization [J]. , 2013, 28(3): 420-428.
[2] Hong-Yu Liang (梁宏宇) and Jing He (何晶). Satisfiability with Index Dependency [J]. , 2012, 27(4): 668-677.
[3] HE Simin; ZHANG Bo;. Solving SAT by Algorithm Transform of Wu s Method [J]. , 1999, 14(5): 468-480.
[4] HUANG Xiong; LI wei;. On k-Positive Satisfiability Problem [J]. , 1999, 14(4): 309-313.
[5] GU Jun; GU Qianping; DU Dingzhu;. On Optimizing the Satisfiability (SAT) Problem [J]. , 1999, 14(1): 1-17.
[6] Lu Weifeng; Zhang Yuping;. Experimental Study on Strategy of CombiningSAT Algorithms [J]. , 1998, 13(6): 608-614.
[7] Tao Xuehong; Sun Wei; Ma Shaohan;. A Practical Propositional Knowledge Base Revision Algorithm [J]. , 1997, 12(2): 154-159.
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved