? Parallel Incremental Frequent Itemset Mining for Large Data
Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | FAQ
 
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
Journal of Computer Science and Technology 2017, Vol. 32 Issue (2) :368-385    DOI: 10.1007/s11390-017-1726-y
Regular Paper Current Issue | Archive | Adv Search << Previous Articles | Next Articles >>
Parallel Incremental Frequent Itemset Mining for Large Data
Yu-Geng Song1,2, Hui-Min Cui1,2, Member, CCF, Xiao-Bing Feng1, Member, CCF, ACM, IEEE
1 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China

Abstract
Reference
Related Articles
Download: [PDF 2209KB]     Export: BibTeX or EndNote (RIS)  
Abstract Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it.
Articles by authors
Yu-Geng Song
Hui-Min Cui
Xiao-Bing Feng
Keywordsincremental parallel FPGrowth   data mining   frequent itemset mining   MapReduce     
Received 2015-11-18;
Fund:

This work was supported by the National High Technology Research and Development 863 Program of China under Grant Nos. 2015AA011505, 2015AA015306, and 2012AA010902, the National Natural Science Foundation of China under Grant Nos. 61202055, 61221062, 61521092, 61303053, 61432016, 61402445, and 61672492, and the National Key Research and Development Program of China under Grant No. 2016YFB1000402.

About author: Yu-Geng Song received his B.S. degree in automation from Tsinghua University, Beijing, in 2012. He is currently working toward his Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include parallel computing and parallel compiling.
Cite this article:   
Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng.Parallel Incremental Frequent Itemset Mining for Large Data[J]  Journal of Computer Science and Technology, 2017,V32(2): 368-385
URL:  
http://jcst.ict.ac.cn:8080/jcst/EN/10.1007/s11390-017-1726-y
Copyright 2010 by Journal of Computer Science and Technology