›› 2015, Vol. 30 ›› Issue (2): 311-323.doi: 10.1007/s11390-015-1524-3

Special Issue: Data Management and Data Mining

• Special Section on Applications and Industry • Previous Articles     Next Articles

A New ETL Approach Based on Data Virtualization

Shu-Sheng Guo1,2,3(郭树盛), Zi-Mu Yuan1,2,3(袁子牧), Ao-Bing Sun3(孙傲冰), Qiang Yue3(岳强)   

  1. 1 State Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100190, China;
    3 Cloud Computing Center, Chinese Academy of Sciences, Dongguan 523808, China
  • Received:2014-11-07 Revised:2015-01-06 Online:2015-03-05 Published:2015-03-05
  • About author:Shu-Sheng Guo is a Ph.D. can-didate in Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research inter-ests focus on cloud computing, big data, data virtualization, and cloud database.
  • Supported by:

    The work was supported by the Guangdong Talents Program of China under Grant No. 201001D0104726115.

ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical

[1] Inmon B. The data warehouse budget. DM Review Maga-zine, 1997.

[2] Demarest M. The politics of data warehousing, 1997. http://www.uncg.edu/ism/ism611/politics.pdf, Jan. 2015.

[3] Vassiliadis P, Simitsis A, Terrovitis M et al. Blueprints and measures for ETL workflows. In Proc. the 24th Int. Conf. Conceptual Modeling, Oct. 2005, pp.385-400.

[4] Vassiliadis P, Simitsis A, Skiadopoulos S. On the logical modeling of ETL processes. In Proc. the 14th Int. Conf. Advanced Information Systems Engineering, May 2002, pp.782-786.

[5] Bleiholder J, Naumann F. Declarative data fusion — Syntax, semantics, and implementation. In Proc. the 9th East European Conf. Advances in Databases and Information Systems, Sept. 2005, pp.58-73.

[6] Bao Y, Song J, Leng F et al. Study and implementation of a new SQL-based ETL approach. Wuhan University Journal of Natural Sciences, 2007, 12(5): 804-808.

[7] Vassiliadis P, Simitsis A, Skiadopoulos S. Conceptual modeling for ETL processes. In Proc. the 5th ACM Interna-tional Workshop on Data Warehousing and OLAP, Nov. 2002, pp.14-21.

[8] Simitsis A, Vassiliadis P. A methodology for the conceptual modeling of ETL processes. In Proc. DSE, June 2003, pp.305-316.

[9] Skoutas D, Simitsis A. Ontology-based conceptual design of ETL processes for both structured and semi-structured data. International Journal on Semantic Web and Infor-mation Systems, 2007, 3(4): 1-24.

[10] Strauch S, Andrikopoulos V, Bachmann T et al. Decision support for the migration of the application database layer to the cloud. In Proc. the 5th IEEE International Confer-ence on Cloud Computing Technology and Science, Dec. 2013, pp.639-646.

[11] Aslam U, Mukhtar H. Data sharing in data-centric multitenant software as a service. In Proc. the 2th International Conference on Cloud and Green Computing, Nov. 2012, pp.113-117.

[12] Berchtold S, Keim D A, Kriegel H P. The X-tree: An index structure for high-dimensional data. In Proc. the 22nd VLDB, Aug. 1996, pp.28-39.

[13] Katayama N, Satoh S. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. SIG-MOD, May 1997, pp.369-380.

[14] White D A, Jain R. Similarity indexing with the SS-tree. In Proc. ICDE, Feb. 1996, pp.516-523.

[15] Datar M, Immorlica N, Indyk P, Mirrokni V S. Localitysensitive hashing scheme based on p-stable distributions. In Proc. the 20th SCG, June 2004, pp.253-262.

[16] Gan J, Feng J, Fang Q, Ng W. Locality sensitive hashing scheme based on dynamic collision counting. In Proc. SIG-MOD, May 2012, pp.541-552.

[17] Heo J P, Lee Y, He J et al. Spherical hashing. In Proc. CVPR, June 2012, pp.2957-2964.

[18] Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128.

[19] Norouzi M, Fleet D. Cartesian k-means. In Proc. CVPR, June 2013, pp.3017-3024.

[20] DeHaan D, Larson P A, Zhou J. Stacked indexed views in Microsoft SQL Server. In Proc. SIGMOD, June 2005, pp.179-190.

[21] Pottinger R, Halevy A. MiniCon: A scalable algorithm for answering queries using views. The VLDB J., 2001, 10(2/3): 182-198.

[22] Ross K A, Srivastava D, Sudarshan S. Materialized view maintenance and integrity constraint checking: Trading space for time. In Proc. SIGMOD, June 1996, pp.447-458.

[23] Segev A, Fang W. Currency-based updates to distributed materialized views. In Proc. the 6th ICDE, Feb. 1990, pp.519-520.

[24] Chand R, Felber P. Semantic peer-to-peer overlays for publish/ subscribe networks. In Lecture Notes in Computer Sci-ence 3648, Cunha J C, Medeiros P D(eds.), Springer-Verlag, 2005, pp.1194-1204.

[25] Papaemmanouil O, Cetintemel U. SemCast: Semantic multicast for content-based data dissemination. In Proc. the 21st ICDE, April 2005, pp.242-253.

[26] Terpstra W W, Behnel S, Fiege L et al. A peer-to-peer approach to content-based publish/subscribe. In Proc. the 2nd DEBS, June 2003.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhao Zhaokeng; Dai Jun; Chen Wendan;. Automated Theorem Proving in Temporal Logic:T-Resolution[J]. , 1994, 9(1): 53 -62 .
[2] I-Shyan Hwang, I-Feng Huang, and Shin-Cheng Yu. Dynamic Fuzzy Controlled RWA Algorithm for IP/GMPLS over WDM Networks[J]. , 2005, 20(5): 717 -727 .
[3] Zhou-Wang Yang, Chun-Lin Wu, Jian-Song Deng,and Fa-Lai Chen. Specification of Initial Shapes for Dynamic Implicit Curve/Surface Reconstruction[J]. , 2006, 21(2): 249 -254 .
[4] Tie-Jun Huang and Yong-Liang Liu. Basic Considerations on AVS DRM Architecture[J]. , 2006, 21(3): 366 -369 .
[5] Xiao-Qing Zheng, Hua-Jun Chen, Zhao-Hui Wu, and Yu-Xin Mao. Dynamic Query Optimization Approach for Semantic Database Grid[J]. , 2006, 21(4): 597 -608 .
[6] Ian Foster. Globus Toolkit Version 4: Software for Service-Oriented Systems[J]. , 2006, 21(4): 513 -520 .
[7] Feng Wang (王锋) Member, CCF, ACM, Can-Qun Yang (杨灿群), Yun-Fei Du (杜云飞), Juan Chen (陈娟), Hui-Zhan Yi (易会战), and Wei-Xia Xu (徐炜遐). Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer[J]. , 2011, 26(5): 854 -865 .
[8] Ke-Yan Cao, Guo-Ren Wang, Dong-Hong Han, Guo-Hui Ding, Ai-Xia Wang, and Ling-Xu Shi. Continuous Outlier Monitoring on Uncertain Data Streams[J]. , 2014, 29(3): 436 -448 .
[9] Cheng Bo, Junze Han, Xiangyang Li, Yu Wang, and Bo Xiao. SA-MAC:Self-stabilizing Adaptive MAC protocol for Wireless Sensor Networks[J]. , 2014, 29(4): 605 -617 .
[10] Wen-Guang Chen. Preface[J]. , 2015, 30(1): 1 -2 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved