SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Xue-Li Liu, Hong-Zhi Wang, Jian-Zhong Li, Hong Gao. EntityManager: Managing Dirty Data Based on Entity Resolution[J]. Journal of Computer Science and Technology, 2017, 32(3): 644-661. DOI: 10.1007/s11390-017-1731-1 |
[1] |
Andritsos P, Fuxman A, Miller R J. Clean answers over dirty databases: A probabilistic approach. In Proc. the 22nd ICDE, April 2006, Article No. 30.
|
[2] |
Fuxman A D, Miller R J. First-order query rewriting for inconsistent databases. In Proc. the 10th ICDT, January 2005, pp.337-351.
|
[3] |
Fuxman A, Fazli E, Miller R J. Conquer: Efficient management of inconsistent databases. In Proc. SIGMOD, June 2005, pp.155-166.
|
[4] |
Boulos J, Dalvi N, Mandhani B, Mathur S, Ré C, Suciu D. MYSTIQ: A system for finding more answers by using probabili31 ties. In Proc. SIGMOD, June 2005, pp.891-893.
|
[5] |
Hassanzadeh O, Miller R J. Creating probabilistic databases from duplicated data. VLDB J., 2009, 18(5): 1141-1166.
|
[6] |
Widom J. Trio: A system for integrated management of data, accuracy, and lineage. In Proc. CIDR, Jan. 2005, pp.262-276.
|
[7] |
Getoor L, Machanavajjhala A. Entity resolution: Theory, practice & open challenges. PVLDB, 2012, 5(12): 2018- 2019.
|
[8] |
Waguih D A, Berti-Equille L. Truth discovery algorithms: An experimental evaluation. arXiv: 1409.6428, May 2014. https://arxiv.org/abs/1409.6428, Mar. 2017.
|
[9] |
Lipner S B, Balenson D M, Ellison C M, Walker S T. System and method for data recovery, September 1996. US Patent 5,557,765. https://www.google.com/patents/us5557765, Apr. 2017.
|
[10] |
Miles M B, Huberman A M. Qualitative Data Analysis: An Expanded Sourcebook. Sage Publications, Inc., 1994.
|
[11] |
Rahm E, Do H H. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4): 3-13.
|
[12] |
Arasu A, Ganti V, Kaushik R. Efficient exact set-similarity joins. In Proc. the 32nd VLDB, September 2006, pp.918- 929.
|
[13] |
Behm A, Ji S, Li C, Lu J. Space-constrained gram-based indexing for efficient approximate string search. In Proc. ICDE, March 29-April 2, 2009, pp.604-615.
|
[14] |
Goemans M X, Williamson D P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 1995, 42(6): 1115-1145.
|
[15] |
Hadjieleftheriou M, Chandel A, Koudas N, Srivastava D. Fast indexes and algorithms for set similarity selection queries. In Proc. the 24th ICDE, April 2008, pp.267-276.
|
[16] |
Hadjieleftheriou M, Koudas N, Srivastava D. Incremental maintenance of length normalized indexes for approximate string matching. In Proc. ACM SIGMOD, June 29-July 2, 2009, pp.429-440.
|
[17] |
Xiao C, Wang W, Lin X. Ed-Join: An efficient algorithm for similarity joins with edit distance constraints. PVLDB, 2008, 1(1): 933-944.
|
[18] |
Xiao C, Wang W, Lin X, Yu J X, Wang G. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 2011, 36(3): 15:1-15:15.
|
[19] |
Zhang Z, Hadjieleftheriou M, Ooi B C, Srivastava D. Bedtree: An all-purpose index structure for string similarity search based on edit distance. In Proc. SIGMOD, June 2010, pp.915-926.
|
[20] |
Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search. In Proc. the 16th WWW, May 2007, pp.131-140.
|
[21] |
Wang J, Li G, Feng J. Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB, 2010, 3(1): 1219-1230.
|
[22] |
Sarawagi S, Kirpal A. Efficient set joins on similarity predicates. In Proc. ACM SIGMOD, June 2004, pp.743-754.
|
[23] |
Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using mapreduce. In Proc. ACM SIGMOD, June 2010, pp.495-506.
|
[24] |
Li C, Wang B, Yang X. VGRAM: Improving performance of approximate queries on string collections using variablelength grams. In Proc. the 33rd VLDB, September 2007, pp.303-314.
|
[25] |
Wang J, Li G, Feng J. Can we beat the prefix filtering?: An adaptive framework for similarity join and search. In Proc. ACM SIGMOD, May 2012, pp.85-96.
|
[26] |
Ioannidis Y E. The history of histograms (abridged). In Proc. the 29th VLDB, Sept. 2003, pp.19-30.
|
[27] |
Haas P J, Naughton J F, Seshadri S, Swami A N. Selectivity and cost estimation for joins based on random sampling. Journal of Computer and System Sciences, 1996, 52(3): 550-569.
|
[28] |
Hou W C, Ozsoyoglu G, Dogdu E. Error-constrained COUNT query evaluation in relational databases. ACM SIGMOD Record, 1991, 20(2): 278-287.
|
[29] |
Olken F. Random sampling from databases[Ph.D. Thesis]. University of California, 1993.
|
[30] |
Ngu A H, Harangsri B, Shepherd J. Query size estimation for joins using systematic sampling. Distributed and Parallel Databases, 2004, 15(3): 237-275.
|
[31] |
Lee H, Ng R T, Shim K. Similarity join size estimation using locality sensitive hashing. PVLDB, 2011, 4(6): 338-349.
|
[32] |
Tong X, Wang H. Fgram-Tree: An index structure based on feature grams for string approximate search. In Proc. the 13th WAIM, August 2012, pp.241-253.
|
[33] |
Liu X, Wang H, Li J, Gao H. Similarity join algorithm based on entity. Journal of Software, 2015, 26(6): 1421-1437. (in Chinese)
|
[34] |
Zhang Y, Yang L, Wang H. Range query estimation for dirty data management system. In Proc. the 13th WAIM, August 2012, pp.152-164.
|
[35] |
Liu X, Wang H, Li J, Gao H. Multi-similarity join order selection in entity database. Journal of Frontiers of Computer Science and Technology, 2012, 6(10): 865-876.
|
[36] |
Garcia-Molina H, Ullman J D, Widom J. Database System Implementation. Prentice-Hall, 2000.
|
[37] |
Abiteboul S, Hull R, Vianu V. Foundations of Databases. Addison-Wesley, 1995.
|
[38] |
Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 2008, 40(4): 11:1-11:58.
|
[39] |
Zhang Y, Yang L, Wang H. Similarity join size estimation with threshold for dirty data. Journal of Computers, 2012, 35(10): 2159-2168. (in Chinese)
|
[40] |
Xu R, Wunsch D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
|
[41] |
Clauset A, Newman M E, Moore C. Finding community structure in very large networks. Physical Review E, 2004, 70(6): 66-111.
|
[42] |
Li Y, Wang H, Gao H. Efficient entity resolution based on sequence rules. In Proc. CSIE, May 2011, pp.381-388.
|
[43] |
Kuang D, Li X, Ling C X. A new search engine integrating hierarchical browsing and keyword search. In Proc. the 22nd IJCAI, July 2011, pp.2464-2469.
|