计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (4): 769-793.doi: 10.1007/s11390-020-0350-4

所属专题: 综述 Data Management and Data Mining

• • 上一篇    下一篇

实体消解中分块技术的综述

Bo-Han Li1,2,3, Member, CCF, ACM, Yi Liu1, Student Member, CCF, An-Man Zhang1, Student Member, CCF Wen-Huan Wang1, Student Member, CCF, Shuo Wan1, Student Member, CCF   

  1. 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2 Key Laboratory of Safety-Critical Software, Ministry of Industry and Information Technology, Nanjing 211106, China;
    3 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
  • 收稿日期:2020-01-30 修回日期:2020-06-08 出版日期:2020-07-20 发布日期:2020-07-20
  • 作者简介:Bo-Han Li received his Ph.D. degree in computer application from Harbin University of Science and Technology, Harbin, in 2009. He is currently an associate professor at the College of Computer Science and Technology of Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing. His current research interests include spatiotemporal database, recommendation system, sentiment analysis, etc. He is a member of CCF and ACM.
  • 基金资助:
    This work was partially supported by the National Natural Science Foundation of China under Grant No. 61772268 and the Fundamental Research Funds for the Central Universities of China under Grant Nos. NS2018057 and NJ2018014.

A Survey on Blocking Technology of Entity Resolution

Bo-Han Li1,2,3, Member, CCF, ACM, Yi Liu1, Student Member, CCF, An-Man Zhang1, Student Member, CCF Wen-Huan Wang1, Student Member, CCF, Shuo Wan1, Student Member, CCF        

  1. 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2 Key Laboratory of Safety-Critical Software, Ministry of Industry and Information Technology, Nanjing 211106, China;
    3 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
  • Received:2020-01-30 Revised:2020-06-08 Online:2020-07-20 Published:2020-07-20
  • About author:Bo-Han Li received his Ph.D. degree in computer application from Harbin University of Science and Technology, Harbin, in 2009. He is currently an associate professor at the College of Computer Science and Technology of Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing. His current research interests include spatiotemporal database, recommendation system, sentiment analysis, etc. He is a member of CCF and ACM.
  • Supported by:
    This work was partially supported by the National Natural Science Foundation of China under Grant No. 61772268 and the Fundamental Research Funds for the Central Universities of China under Grant Nos. NS2018057 and NJ2018014.

1、研究背景(context):
当前越来越多的企业、政府以及研究机构从不同的领域收集各种描述实体的数据。然而,人们从不同数据源获取大量的实体信息在现实中可能会具有多重不同的实体描述信息的情况。因此,一项重要的研究工作就是对不同数据源中获取的实体信息进行合并和链接。实体消解(ER)可以实现识别指向现实世界中同一对象的不同实体。为了克服许多ER算法的时间复杂度高的问题,学者们提出了许多方法和技术,其中分块法是提高ER效率并确保算法结果准确性的重要方法之一。自1969年标准分块算法提出以来,出现了许多针对不同的数据结构的分块实体消解算法。本文尝试从不同角度对这些不同分支的算法进行分析、梳理和综述,对该领域的主要方法进行了详细的介绍和归纳。
2、目的(Objective):
本篇综述聚焦于实体消解中的分块技术,旨在以易于理解的方式,从分块构造方法和分块优化技术的角度切入,给出众多实体消解分块算法的概述、关系梳理以及优劣势分析。
3、方法(Method):
为了使文章的分析更加严谨,我们首先给出ER和分块技术的背景知识和相关定义。然后依据不同的分块构造方法对算法进行划分并总结。更具体地说,将算法分为基于规则的算法和基于机器学习的算法,在给出算法描述和优缺点的同时,还从算法所适用数据结构的角度对算法进行分析。接着我们从“块调优”和“比较调优”的角度切入,总结了常见的分块优化技术。最后,给出了一些有关分块ER以及部分广义ER的具有研究前景的工作。
4、结果(Result&Findings):
展示并分析了从经典到最近出现的分块ER算法以及分块优化技术,梳理了算法之间的关系。以表格的形式展示了各个算法的常见分类,并从算法适用的数据结构的角度给出了分析。最后,还给出了一些有关ER的具有研究前景的工作。
5、结论(Conclusions):
本篇综述聚焦于实体消解中的分块技术,使用易于理解的方法从多个角度展示和分析现有的ER分块算法。根据不同分块构造方法的本质特征对其进行分类,并从不同角度对其进行总结。为了使算法能够处理更复杂和更大的数据集,许多优化分块技术被提出。我们根据优化技术处理粒度的不同,将这些优化方法分为“块调优”和“比较调优”两类进行介绍和分析。最后,介绍了一些分块ER具有研究前景的工作。

关键词: 分块构建, 分块优化, 数据链接, 实体消解

Abstract: Entity resolution (ER) is a significant task in data integration, which aims to detect all entity profiles that correspond to the same real-world entity. Due to its inherently quadratic complexity, blocking was proposed to ameliorate ER, and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pairwise comparisons inside each block in order to reduce the computational cost of ER. This paper presents a comprehensive survey on existing blocking technologies. We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques. We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces. How to use schema information flexibly is of great significance to efficiently process data with the new features of this era. Machine learning is an important tool for ER, but end-to-end and efficient machine learning methods still need to be explored. We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER, incremental blocking ER, deep learning with ER, etc.

Key words: blocking construction, blocking optimization, data linkage, entity resolution

[1] Bhattacharya I, Getoor L. A Latent Dirichlet model for unsupervised entity resolution. In Proc. the 2006 SIAM International Conference on Data Mining, Apr. 2006, pp.47-58.
[2] Liu X L, Wang H Z, Li J Z, Gao H. EntityManager:Managing dirty data based on entity resolution. J. Comput. Sci. Technol., 2017, 32(3):644-662.
[3] Winkler W E. Methods for evaluating and creating data quality. Inf. Syst., 2004, 29(7):531-550.
[4] Winkler W E. Overview of record linkage and current research directions. Technical Report, U.S. Bureau of the Census, 2006. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7D191FC85CD0D418884ACD2CECC2C190?doi=1-0.1.1.79.1519&rep=rep1&type=pdf, March 2020.
[5] Newcombe H B, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130(3381):954-959.
[6] Bhattacharya I, Getoor L. Iterative record linkage for cleaning and integration. In Proc. the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Jun. 2004, pp.11-18.
[7] Pasula H, Marthi B, Milch B, Russell S J, Shpitser I. Identity uncertainty and citation matching. In Proc. the 2002 Annual Conference on Neural Information Processing Systems, Dec. 2002, pp.1401-1408.
[8] Fan W F, Jia X B, Li J Z, Ma S. Reasoning about record matching rules. Proc. the VLDB Endowment, 2009, 2(1):407-418.
[9] Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2003, pp.39-48.
[10] Bhattacharya I, Getoor L. Deduplication and group detection using links. In Proc. the 2004 ACM SIGKDD Workshop on Link Analysis and Group Detection, August 2004.
[11] Getoor L, Machanavajjhala A. Entity resolution:Theory, practice & open challenges. Proc. the VLDB Endowment, 2012, 5(12):2018-2019.
[12] Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(9):1537-1555.
[13] Christen P. Data Matching:Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, 2012.
[14] Dong X L, Srivastava D. Big data integration. In Proc. the 29th International Conference on Data Engineering, Apr. 2013, pp.1245-1248.
[15] Papadakis G, Skoutas D, Thanos E, Palpanas T. A survey of blocking and filtering techniques for entity resolution. arXiv:1905.06167, 2019. https://arxiv.org/abs/1905.06167, March 2020.
[16] Dunn H L. Record linkage. American Journal of Public Health and the Nation's Health, 1946, 36(12):1412-1416.
[17] Christophides V, Efthymiou V, Stefanidis K. Entity resolution in the web of data. Synthesis Lectures on the Semantic Web, 2015, 5(3):1-122.
[18] Papadakis G, Ioannou E, Palpanas T, Niederée C, Nejdl W. A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Transactions on Knowledge and Data Engineering, 2012, 25(12):2665-2682.
[19] Papadakis G, Ioannou E, Niederée C, Palpanas T, Nejdl W. Eliminating the redundancy in blocking-based entity resolution methods. In Proc. the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, Jun. 2011, pp.85-94.
[20] Papadakis G, Alexiou G, Papastefanatos G, Koutrika G. Schema-agnostic vs schema-based configurations for blocking methods on homogeneous data. Proc. the VLDB Endowment, 2015, 9(4):312-323.
[21] Fellegi I P, Sunter A B. A theory for record linkage. Journal of the American Statistical Association, 1969, 64(328):1183-1210.
[22] Aizawa A, Oyama K. A fast linkage detection scheme for multi-source information integration. In Proc. the 2005 International Workshop on Challenges in Web Information Retrieval and Integration, Apr. 2005, pp.30-39.
[23] de Vries T, Ke H, Chawla S, Christen P. Robust record linkage blocking using suffix arrays. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.305-314.
[24] Allam A, Skiadopoulos S, Kalnis P. Improved suffix blocking for record linkage and entity resolution. Data & Knowledge Engineering, 2018, 117:98-113.
[25] Gravano L, Ipeirotis P G, Jagadish H V, Koudas N, Muthukrishnan S, Srivastava D. Approximate string joins in a database (almost) for free. In Proc. the 27th International Conference on Very Large Data Bases, Sept. 2001, pp.491-500.
[26] Baxter R, Christen P, Churches T. A comparison of fast blocking methods for record linkage. In Proc. the ACM SIGKDD 2003 Workshop on Data Cleaning, Record Linkage and Object Consolidation, Aug. 2003.
[27] Kenig B, Gal A. MFIBlocks:An effective blocking algorithm for entity resolution. Information Systems, 2013, 38(6):908-926.
[28] McCallum A, Nigam K, Ungar L H. Efficient clustering of high-dimensional data sets with application to reference matching. In Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2000, pp.169-178.
[29] Hernández M A, Stolfo S J. The merge/purge problem for large databases. ACM SIGMOD Record, 1995, 24(2):127-138.
[30] Yan S, Lee D, Kan M Y, Giles L C. Adaptive sorted neighborhood methods for efficient record linkage. In Proc. the 7th ACM/IEEE Joint Conference on Digital Libraries, Jun. 2007, pp.185-194.
[31] Draisbach U, Naumann F. A comparison and generalization of blocking and windowing algorithms for duplicate detection. In Proc. the International Workshop on Quality in Databases, Aug. 2009, pp.51-56.
[32] Draisbach U, Naumann F, Szott S, Wonneberg O. Adaptive windows for duplicate detection. In Proc. the 28th IEEE International Conference on Data Engineering, Apr. 2012, pp.1073-1083.
[33] Draisbach U, Naumann F. A generalization of blocking and windowing algorithms for duplicate detection. In Proc. the 2011 International Conference on Data and Knowledge Engineering, Sept. 2011, pp.18-24.
[34] Papadakis G, Ioannou E, Niederée C, Fankhauser P. Efficient entity resolution for large heterogeneous information spaces. In Proc. the 4th International Conference on Web Search and Web Data Mining, Feb. 2011, pp.535-544.
[35] Papadakis G, Ioannou E, Niederée C, Palpanas T, Nejdl W. Beyond 100 million entities:Large-scale blocking-based resolution for heterogeneous data. In Proc. the 5th International Conference on Web Search and Web Data Mining, Feb. 2012, pp.53-62.
[36] Song D, Heflin J. Automatically generating data linkages using a domain-independent candidate selection approach. In Proc. the 10th International Semantic Web Conference, Oct. 2011, pp.649-664.
[37] Nin J, Muntes-Mulero V, Martínez-Bazan N, Larriba-Pey J. On the use of semantic blocking techniques for data cleansing and integration. In Proc. the 11th International Database Engineering and Applications Symp., Sept. 2007, pp.190-198.
[38] Bilenko M, Kamath B, Mooney R J. Adaptive blocking:Learning to scale up record linkage. In Proc. the 6th IEEE International Conference on Data Mining, Dec. 2006, pp.87-96.
[39] Michelson M, Knoblock C A. Learning blocking schemes for record linkage. In Proc. the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, Jul. 2006, pp.440-445.
[40] Evangelista L O, Cortez E, da Silva A S, Meira W. Adaptive and flexible blocking for record linkage tasks. Journal of Information and Data Management, 2010, 1(2):167-167.
[41] Sarma A D, Jain A, Machanavajjhala A, Bohannon P. An automatic blocking mechanism for large-scale deduplication tasks. In Proc. the 21st ACM International Conference on Information and Knowledge Management, Oct. 2012, pp.1055-1064.
[42] Giang P H. A machine learning approach to create blocking criteria for record linkage. Health Care Management Science, 2015, 18(1):93-105.
[43] Cao Y, Chen Z, Zhu J, Yue P, Lin C, Yu Y. Leveraging unlabeled data to scale blocking for record linkage. In Proc. the 22nd International Joint Conference on Artificial Intelligence, Jul. 2011, pp.2211-2217.
[44] Mayank K, Daniel P M. An unsupervised algorithm for learning blocking schemes. In Proc. the 13th International Conference on Data Mining, Dec. 2019, pp.340-349.
[45] O'Hare K, Jurek-Loughrey A, de Campos C. An unsupervised blocking technique for more efficient record linkage. Data & Knowledge Engineering, 2019, 122:181-195.
[46] Ma Y T, Tran T. TYPiMatch:Type-specific unsupervised learning of keys and key values for heterogeneous web data integration. In Proc. the 6th ACM International Conference on Web Search and Data Mining, Feb. 2013, pp.325-334.
[47] Kejriwal M, Miranker D P. A two-step blocking scheme learner for scalable link discovery. In Proc. the 9th International Workshop on Ontology Matching Collocated with the 13th International Semantic Web Conference, Oct. 2014, pp.49-60.
[48] Kejriwal M, Miranker D P. A DNF blocking scheme learner for heterogeneous datasets. arXiv:1501.01694. https://arxiv.org/abs/1501.01694, Jan. 2020.
[49] Papadakis G, Ioannou E, Niederée C, Palpanas T, Nejdl W. To compare or not to compare:Making entity resolution more efficient. In Proc. the International Workshop on Semantic Web Information Management, Jun. 2011, Article No. 3.
[50] Papadakis G, Koutrika G, Palpanas T, Nejdl W. Metablocking:Taking entity resolution to the next level. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(8):1946-1960.
[51] Ferragina P, Grossi R. The string B-tree:A new data structure for string search in external memory and its applications. Journal of the ACM, 1999, 46(2):236-280.
[52] Christen P. Towards parameter-free blocking for scalable record linkage. Technical Report, Faculty of Engineering and Information Technology, 2007. http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-07-03.pdf, March 2020.
[53] Madhavan J, Jeffery S R, Cohen S, Dong X L, Ko D, Yu C, Halevy A. Web-scale data integration:You can only afford to pay as you go. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, Jan. 2007, pp.342-350.
[54] Papadakis G, Palpanas T. Blocking for large-scale entity resolution:Challenges, algorithms, and practical examples. In Proc. the 32nd International Conference on Data Engineering, May 2016, pp.1436-1439.
[55] Chaudhuri S, Chen B, Ganti V, Kaushik R. Example-driven design of efficient record matching queries. In Proc. the 33rd International Conference on Very Large Data Bases, Sept. 2007, pp.327-338.
[56] Papadakis G, Demartini G, Fankhauser P, Kärger P. The missing links:Discovering hidden same-as links among a billion of triples. In Proc. the 12th International Conference on Information Integration and Web-Based Applications and Services, Nov. 2010, pp.453-460.
[57] Winkler W E. Approximate string comparator search strategies for very large administrative lists. Technical Report, U.S. Census Bureau, 2005. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.402&rep=rep1&type=pdf, March 2020.
[58] Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27(11):1134-1142.
[59] Suchanek F M, Abiteboul S, Senellart P. PARIS:Probabilistic alignment of relations, instances, and schema. Proceedings the VLDB Endowment, 2011, 5(3):157-168.
[60] Papadakis G, Svirsky J, Gal A, Palpanas T. Comparative analysis of approximate blocking techniques for entity resolution. Proceedings the VLDB Endowment, 2016, 9(9):684-695.
[61] Jonker R, Volgenant T. Improving the Hungarian assignment algorithm. Operations Research Letters, 1986, 5(4):171-175.
[62] Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests:A survey and results of new tests. Pattern Recognition, 2011, 44(2):330-349.
[63] Bilke A, Naumann F. Schema matching using duplicates. In Proc. the 21st International Conference on Data Engineering, Apr. 2005, pp.69-80.
[64] Papadakis G, Papastefanatos G, Palpanas T, Koubarakis M. Scaling entity resolution to large, heterogeneous data with enhanced meta-blocking. In Proc. the 19th International Conference on Extending Database Technology, Mar. 2016, pp.221-232.
[65] Fisher J, Christen P, Wang Q, Rahm E. A clustering-based framework to control block sizes for entity resolution. In Proc. the 21st International Conference on Knowledge Discovery and Data Mining, Aug. 2015, pp.279-288.
[66] Whang S E, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H. Entity resolution with iterative blocking. In Proc. the International Conference on Management of Data, Jun. 2009, pp.219-232.
[67] Shu L, Chen A, Xiong M, Meng W. Efficient SPectrAl Neighborhood blocking for entity resolution. In Proc. the 27th International Conference on Data Engineering, Apr. 2011, pp.1067-1078.
[68] Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
[69] Christen P, Gayler R, Hawking D. Similarity-aware indexing for real-time entity resolution. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.1565-1568.
[70] Ramadan B, Christen P, Liang H, Gayler R W, Hawking D. Dynamic similarity-aware inverted indexing for real-time entity resolution. In Proc. the 2013 PAKDD Workshop on Data Mining Applications in Industry and Government, Apr. 2013, pp.47-58.
[71] Ramadan B, Christen P. Forest-based dynamic sorted neighborhood indexing for real-time entity resolution. In Proc. the 23rd International Conference on Information and Knowledge Management, Nov. 2014, pp.1787-1790.
[72] Ramadan B, Christen P, Liang H, Gayler R W. Dynamic sorted neighborhood indexing for real-time entity resolution. Journal of Data and Information Quality, 2015, 6(4):Article No. 15.
[73] Rice S V. Braided AVL trees for efficient event sets and ranked sets in the SIMSCRIPT III simulation programming language. In Proc. the 2007 Western Multiconference on Computer Simulation, Jan. 2007, pp.150-155.
[74] Ramadan B, Christen P. Unsupervised blocking key selection for real-time entity resolution. In Proc. the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2015, pp.574-585.
[75] Liang H, Wang Y, Christen P, Gayler R. Noise-tolerant approximate blocking for dynamic real-time entity resolution. In Proc. the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2014, pp.449-460.
[76] Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang S E, Widom J. Swoosh:A generic approach to entity resolution. The VLDB Journal, 2009, 18(1):255-276.
[77] Araújo B T, Stefanidis K, Pires C E S, Nummenmaa J, da Nóbrega P T. Incremental blocking for entity resolution over web streaming data. In Proc. the 2019 IEEE/WIC/ACM International Conference on Web Intelligence, Oct. 2019, pp.332-336.
[78] Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N. Distributed representations of tuples for entity resolution. Proc. the VLDB Endowment, 2018, 11(11):1454-1467.
[79] Mudgal S, Li H, Rekatsinas T, Doan A, Park Y, Krishnan G, Deep R, Arcaute E, Raghavendra V. Deep learning for entity matching:A design space exploration. In Proc. the 2018 International Conference on Management of Data, Jun. 2018, pp.19-34.
[80] di Cicco V, Firmani D, Koudas N, Merialdo P, Srivastava D. Interpreting deep learning models for entity resolution:An experience report using LIME. In Proc. the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jul. 2019, Article No. 8.
[81] Papenbrock T, Heise A, Naumann F. Progressive duplicate detection. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(5):1316-1329.
[82] Whang S E, Marmaros D, Garcia-Molina H. Pay-as-yougo entity resolution. IEEE Transactions on Knowledge and Data Engineering, 2012, 25(5):1111-1124.
[83] Altowim Y, Kalashnikov D V, Mehrotra S. Progressive approach to relational entity resolution. Proc. the VLDB Endowment, 2014, 7(11):999-1010.
[84] Altowim Y, Mehrotra S. Parallel progressive approach to entity resolution using MapReduce. In Proc. the 33rd IEEE International Conference on Data Engineering, Apr. 2017, pp.909-920.
[85] Bhattacharya I, Getoor L. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1):Article No. 5.
[86] Globerson A, Lazic N, Chakrabarti S, Subramanya A, Ringaard M, Pereira F. Collective entity resolution with multi-focal attention. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016.
[87] Kouki P, Pujara J, Marcum C, Koehly L, Getoor L. Collective entity resolution in familial networks. In Proc. the 2017 IEEE International Conference on Data Mining, Nov. 2017, pp.227-236.
[88] de Assis Costa G, de Oliveira J M P. A relational learning approach for collective entity resolution in the web of data. In Proc. the 5th International Workshop on Consuming Linked Data Co-Located with the 13th International Semantic Web Conference, Oct. 2014.
[89] Kouki P, Pujara J, Marcum C, Koehly L, Getoor L. Collective entity resolution in multi-relational familial networks. Knowledge and Information Systems, 2019, 61(3):1547-1581.
[90] Wang J, Kraska T, Franklin M J, Feng J. CrowdER:Crowdsourcing entity resolution. arXiv:1208.1927, 2012. http://arxiv.org/abs/1208.1927, Aug. 2018.
[91] Vesdapunt N, Bellare K, Dalvi N. Crowdsourcing algorithms for entity resolution. Proc. the VLDB Endowment, 2014, 7(12):1071-1082.
[92] Gong S S, Hu W, Ge W Y, Qu Y Z. Modeling topic-based human expertise for crowd entity resolution. Journal of Computer Science and Technology, 2018, 33(6):1204-1218.
[93] Zhang A Z, Li J Z, Gao H, Chen Y B, Ma H Z, Bah M J. CrowdOLA:Online aggregation on duplicate data powered by crowdsourcing. Journal of Computer Science and Technology, 2018, 33(2):366-379.
[94] Mazumdar A, Saha B. A theoretical analysis of first heuristics of crowdsourced entity resolution. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.970-976.
[95] Chai C, Li G, Li J, Deng D, Feng J. A partial-order-based framework for cost-effective crowdsourced entity resolution. The VLDB Journal, 2018, 27(6):745-770.
[96] Maskat R, Paton N W, Embury S M. Pay-as-you-go configuration of entity resolution. Transactions on Large-Scale Data-and Knowledge-Centered Systems, 2016, 29:40-65.
[97] Li H, Konda P, G.C. P S, Doan A, Snyder B, Park, Y, Krishnan G, Deep R, Raghavendra V. MatchCatcher:A debugger for blocking in entity matching. In Proc. the 21st International Conference on Extending Database Technology, Mar. 2018, pp.193-204.
[98] Papadakis G, Giannakopoulos G, Niederée C, Palpanas T, Nejdl W. Detecting and exploiting stability in evolving heterogeneous information spaces. In Proc. the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, Jun. 2011, pp.95-104.
[1] Sai-Sai Gong, Wei Hu, Wei-Yi Ge, Yu-Zhong Qu. 面向众包实体消解的用户专长主题建模[J]. 计算机科学技术学报, 2018, 33(6): 1204-1218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[5] 孙钟秀; 商陆军;. DMODULA:A Distributed Programming Language[J]. , 1986, 1(2): 25 -31 .
[6] 陈世华;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[7] 金兰; 杨元元;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[8] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[9] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[10] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: