We use cookies to improve your experience with our site.
Yue-Feng Du, De-Rong Shen, Tie-Zheng Nie, Yue Kou, Ge Yu. Content-Related Repairing of Inconsistencies in Distributed Data[J]. Journal of Computer Science and Technology, 2016, 31(4): 741-758. DOI: 10.1007/s11390-016-1660-4
Citation: Yue-Feng Du, De-Rong Shen, Tie-Zheng Nie, Yue Kou, Ge Yu. Content-Related Repairing of Inconsistencies in Distributed Data[J]. Journal of Computer Science and Technology, 2016, 31(4): 741-758. DOI: 10.1007/s11390-016-1660-4

Content-Related Repairing of Inconsistencies in Distributed Data

  • Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive:1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new inconsistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return