›› 2010, Vol. 25 ›› Issue (3): 537-547.

• Special Section on Trends Changing Data Management • Previous Articles     Next Articles

A Query Interface Matching Approach Based on Extended Evidence Theory for Deep Web

Yong-Quan Dong1,2 (董永权), Member, CCF, Qing-Zhong Li1,* (李庆忠), Senior Member, CCF, Yan-Hui Ding1(丁艳辉), Member, CCF, and Zhao-Hui Peng1 (彭朝晖), Member, CCF   

  1. 1School of Computer Science and Technology, Shandong University, Jinan 250014, China
    2School of Computer Science and Technology, Xuzhou Normal University, Xuzhou 221000, China
  • Received:2009-06-21 Revised:2010-03-11 Online:2010-05-05 Published:2010-05-05
  • About author:
    Yong-Quan Dong is a Ph.D. candidate at School of Computer Science and Technology, Shandong University. He is a member of CCF. His research interests include Web information integration and Web data management.
    Qing-Zhong Li is a professor at School of Computer Science and Technology, Shandong University. He is a senior member of CCF. His research interests include Web information integration and enterprise information integration.
    Yan-Hui Ding is a Ph.D. candidate at School of Computer Science and Technology, Shandong University. He is a member of CCF. His research interests include Web information integration and Web information extraction.
    Zhao-Hui Peng is a lecturer at School of Computer Science and Technology, Shandong University. He received his Ph.D. degree from School of Information, Renmin University. He is a member of CCF. His research interests include searching databases with keywords and Web data management.
  • Supported by:

    Supported by the National Natural Science Foundation of China under Grant No. 90818001 and the Natural Science Foundation of Shandong Province of China under Grant No. Y2007G24.

Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not sufficient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limitations of treating individual matchers in different matching tasks equally for query interface matching, which reduces matching performance. This paper proposes a novel query interface matching approach based on extended evidence theory for Deep Web. Our approach firstly introduces the dynamic prediction procedure of different matchers' credibilities. Then, it extends traditional evidence theory with the credibilities and uses exponentially weighted evidence theory to combine the results of multiple matchers. Finally, it performs matching decision in terms of some heuristics to obtain the final matches. Our approach overcomes the shortage of traditional method and can adapt to different matching tasks. Experimental results demonstrate the feasibility and effectiveness of our proposed approach.


[1] Dragut E C, Yu C, Meng W. Meaningful labeling of integrated query interfaces. In Proc. the 32nd International Conference on Very Large Data Bases, Seoul, Korea, Sept. 12-15, 2006, pp.679-690.

[2] He B, Chang K C. Statistical schema matching across Web query interfaces. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, USA, June 9-12, 2003, pp.217-228.

[3] Wu W, Yu C, Doan A H, Meng W. An interactive clusteringbased approach to integrating source query interfaces on the Deep Web. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, June 1318, 2004, pp.95-106.

[4] Wu W, Doan A H, Yu C. Merging interface schemas on the Deep Web via clustering aggregation. In Proc. the Fifth IEEE International Conference on Data Mining, Houston, USA, Nov. 27-30, 2005, pp.801-804.

[5] Hong J, He Z, Bell D. An evidential approach to query interface matching on the deep Web. In Proc. the International Workshop on New Trends in Information Integration, Auckland, New Zealand, Aug. 23, 2008, pp.20-23.

[6] He Z, Hong J, Bell D. Schema matching across query interfaces on the Deep Web. In Proc. the 25th British National Conference on Databases (BNCOD2008), Cardiff, UK, July 7-10, 2008, pp.51-62.

[7] He H, Meng W, Yu C T, Wu Z. Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In Proc the 29th International Conference on Very Large Data Bases, Berlin, Germany, Sept. 9-12, 2003, pp.357-368.

[8] Dempster A P. Upper and lower probabilities induced by multivalued mapping. The Annals of Mathematical Statistics, 1967, 38(2): 325-339.

[9] Rahm E, Bernstein P A. A survey of approaches to automatic schema matching. The VLDB Journal, 2001, 10(4): 334-350.

[10] He B, Chang K C, Han J. Discovering complex matchings across web query interfaces: A correlation mining approach. In Proc. the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, Aug. 22-25, 2004, pp.148-157.

[11] Do H H, Rahm E. COMA: A system for flexible combination of schema matching approaches. In Proc. the 28th International Conference on Very Large Data Bases, Hong Kong, China, Aug. 20-23, 2002, pp.610-621.

[12] Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid. In Proc. the 27th International Conference on Very Large Data Bases, Rome, Italy, Sept. 11-14, 2001, pp.49-58.

[13] Yong K T. CMC: Combining multiple schema-matching strategies based on credibility prediction. In Proc. the 10th International Database Systems for Advanced Applications, Beijing, China, Apr. 17-20, 2005, pp.888-893.

[14] Doan A, Domingos P, Halvey A. Reconciling schemas of disparate data sources: A machine-learning approach. In Proc. the 2001 SIGMOD International Conference on Management of Data, Santa Barbara, USA, May 21-24, 2001, pp.509-520.

[15] Shafer G. A Mathematical Theory of Evidence. Princeton University Press, 1976.

[16] Hall P A, Dowling G R. Approximate string matching. ACM Computing Surveys, 1980, 12(4): 381-402.

[17] Cohen W, Ravikumar P, Fienberg S. A comparison of string distance metrics for name-matching tasks. In Proc. the 2nd International Workshop on Information Integration on the Web, Acapulco, Mexico, Aug. 9-10, 2003, pp.73-78.

[18] ICQ Query Interfaces dataset. http://metaquerier.cs.uiuc.edu/ repository/datasets/icq/index.html.

[19] van Rijsbergen C J. Information Retrieval, Butterworths, 1979.

[20] Wu W, Doan A H, Yu C. WebIQ: Learning from the Web to match Deep-Web query interfaces. In Proc. the 22nd International Conference on Data Engineering, Atlanta, GA, USA, April 3-8, 2006, pp.44-53.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved