A New ETL Approach Based on Data Virtualization

Shu-Sheng Guo1,2,3(郭树盛), Zi-Mu Yuan1,2,3(袁子牧), Ao-Bing Sun3(孙傲冰), Qiang Yue3(岳强)   

  1. 1 State Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100190, China;
    3 Cloud Computing Center, Chinese Academy of Sciences, Dongguan 523808, China
  • Received:2014-11-07 Revised:2015-01-06 Online:2015-03-05 Published:2015-03-05
  • About author:Shu-Sheng Guo is a Ph.D. can-didate in Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research inter-ests focus on cloud computing, big data, data virtualization, and cloud database.
  • Supported by:

    The work was supported by the Guangdong Talents Program of China under Grant No. 201001D0104726115.

ETL (Extract-Transform-Load) usually includes three phases: extraction, transformation, and loading. In building data warehouse, it plays the role of data injection and is the most time-consuming activity. Thus it is necessary to improve the performance of ETL. In this paper, a new ETL approach, TEL (Transform-Extract-Load) is proposed. The TEL approach applies virtual tables to realize the transformation stage before extraction stage and loading stage, without data staging area or staging database which stores raw data extracted from each of the disparate source data systems. The TEL approach reduces the data transmission load, and improves the performance of query from access layers. Experimental results based on our proposed benchmarks show that the TEL approach is feasible and practical

