? EntityManager: Managing Dirty Data Based on Entity Resolution
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | FAQ
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (3) :644-661    DOI: 10.1007/s11390-017-1731-1
Regular Paper Current Issue | Archive | Adv Search << Previous Articles | >>
EntityManager: Managing Dirty Data Based on Entity Resolution
Xue-Li Liu, Hong-Zhi Wang*, Member, CCF, Jian-Zhong Li, Fellow, CCF, Hong Gao, Senior Member, CCF
Massive Data Computing Laboratory, Harbin Institute of Technology, Harbin 150001, China

Abstract
Reference
Related Articles
Download: [PDF 994KB]     Export: BibTeX or EndNote (RIS)  
Abstract Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may cause information loss and bring new inconsistencies. To avoid these problems, we propose EntityManager, a general system to manage dirty data without data cleaning. This system takes real-world entity as the basic storage unit and retrieves query results according to the quality requirement of users. The system is able to handle all kinds of inconsistencies recognized by entity resolution. We elaborate the EntityManager system, covering its architecture, data model, and query processing techniques. To process queries efficiently, our system adopts novel indices, similarity operator and query optimization techniques. Finally, we verify the efficiency and effectiveness of this system and present future research challenges.
Articles by authors
Keywordsdirty data   entity resolution   uncertain attribute   query processing   query optimization     
Received 2016-02-29;
Fund:

This work was partially supported by the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH10F01, the National Natural Science Foundation of China under Grant Nos. U1509216, 61472099, and 61133002, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province of China under Grant No. LC2016026, and the Ministry of Education (MOE)-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology.

Corresponding Authors: Hong-Zhi Wang     Email: wangzh@hit.edu.cn
About author: Xue-Li Liu is a Ph.D. candidate in computer technology and science, Harbin Institute of Technology, Harbin. Her research interests include data quality and massive data management.
Cite this article:   
Xue-Li Liu, Hong-Zhi Wang, Jian-Zhong Li, Hong Gao.EntityManager: Managing Dirty Data Based on Entity Resolution[J]  Journal of Computer Science and Technology, 2017,V32(3): 644-661
URL:  
http://jcst.ict.ac.cn:8080/jcst/EN/10.1007/s11390-017-1731-1
Copyright 2010 by Journal of Computer Science and Technology