We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data[J]. Journal of Computer Science and Technology, 2017, 32(2): 368-385. DOI: 10.1007/s11390-017-1726-y
Citation: Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data[J]. Journal of Computer Science and Technology, 2017, 32(2): 368-385. DOI: 10.1007/s11390-017-1726-y

Parallel Incremental Frequent Itemset Mining for Large Data

Funds: This work was supported by the National High Technology Research and Development 863 Program of China under Grant Nos. 2015AA011505, 2015AA015306, and 2012AA010902, the National Natural Science Foundation of China under Grant Nos. 61202055, 61221062, 61521092, 61303053, 61432016, 61402445, and 61672492, and the National Key Research and Development Program of China under Grant No. 2016YFB1000402.
More Information
  • Author Bio:

    Yu-Geng Song received his B.S. degree in automation from Tsinghua University, Beijing, in 2012. He is currently working toward his Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing. His research interests include parallel computing and parallel compiling.

  • Received Date: November 17, 2015
  • Revised Date: November 27, 2016
  • Published Date: March 04, 2017
  • Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it.
  • [1]
    Liu D, Shih Y. Integrating AHP and data mining for product recommendation based on customer lifetime value. Information & Management, 2005, 42(3):387-400.
    [2]
    Iváncsy R, Vajk I. Frequent pattern mining in web log data. Acta Polytechnica Hungarica, 2006, 3(1):77-90.
    [3]
    Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2):1-12.
    [4]
    Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Data Bases, Sept. 1994, pp.487-499.
    [5]
    Pietracaprina A, Zandolin D. Mining frequent itemsets using patricia tries. http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.11.437, Jan. 2017.
    [6]
    Inokuchi A, Washio T, Motoda H. An Apriori-based algorithm for mining frequent substructures from graph data. In Proc. the 4th European Conference on Principles of Data Mining and Knowledge Discovery, Sept. 2000, pp.13-23.
    [7]
    Li H, Wang Y, Zhang D et al. PFP:Parallel FP-Growth for query recommendation. In Proc. the ACM Conference on Recommender Systems, Oct. 2008, pp.107-114.
    [8]
    Aouad L, Le-Khac N, Kechadi T. Distributed frequent itemsets mining in heterogeneous platforms. Journal of Engineering, Computing and Architecture, 2007, 1(2):1-12.
    [9]
    Liu L, Li E, Zhang Y. Optimization of frequent itemset mining on multiple-core processor. In Proc. the 33rd Int. Conf. Very Large Data Bases, Sept. 2007, pp.1275-1285.
    [10]
    Patel S, Kotecha K. Frequent pattern mining using parallel architecture of artificial bee colony. International Journal of Advance Research in Computer Science and Management Studies, 2015, 3(11):287-293.
    [11]
    Chang H Y, Tzang Y J, Lin J C, Hong Z H, Chi T Y, Huang C Y. A hybrid algorithm for frequent pattern mining using MapReduce framework. In Proc. the 1st IEEE Int. Conf. Computational Intelligence Theory, Systems and Applications (CCITSA), Dec. 2015, pp.19-22.
    [12]
    Gole S, Tidke B. Frequent itemset mining for big data in social media using ClustBigFIM algorithm. In Proc. Int. Conf. Pervasive Computing (ICPC), Jan. 2015.
    [13]
    Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1):107-113.
    [14]
    Leung C, Khan Q, Li Z et al. CanTree:A canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems, 2007, 11(3):287-311.
    [15]
    Cheng H, Yan X, Han J. IncSpan:Incremental mining of sequential patterns in large database. In Proc. the 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 2004, pp.527-532.
    [16]
    Leung C, Khan Q, Hoque T. CanTree:A tree structure for efficient incremental mining of frequent patterns. In Proc. the 5th IEEE Int. Conf. Data Mining, Nov. 2005.
    [17]
    Basak P, Sedamkar R R, Thakur R. Fast mining of finding frequent patterns in transactional database using incremental approach. International Journal of Applied Information Systems (IJAIS), 2015, 9(2):6-10.
    [18]
    Cheung D W, Han J, Ng V T, Wong C Y. Maintenance of discovered association rules in large databases:An incremental updating technique. In Proc. the 12th Int. Conf. Data Engineering, Feb.26-Mar.1, 1996, pp.106-114.
    [19]
    Ayan N F, Tansel A U, Arkun E. An efficient algorithm to update large itemsets with early pruning. In Proc. the 5th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 1999, pp.287-291.
    [20]
    Li Y, Zhang Z H, Chen W B, Min F. TDUP:An approach to incremental mining of frequent itemsets with three-waydecision pattern updating. International Journal of Machine Learning and Cybernetics, 2015, pp.1-13.
    [21]
    Huo W, Feng X, Zhang Z. An efficient approach for incremental mining fuzzy frequent itemsets with FP-tree. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2016, 24(03):367-386.
    [22]
    Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In Proc. the ACM SIGMOD Conference on Management of Data, May 1993, pp.207-216.
    [23]
    Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.25-33.
    [24]
    Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In Proc. the 26th IEEE Symp. Mass Storage Systems and Technologies (MSST), May 2010. http://storagecnference.us/2010/Papers/MSST/Shvachko. pdf, Feb. 2017.
    [25]
    Borthakur D. The Hadoop Distributed File system:Architecture and Design. Hadoop Project Website, 2007.
    [26]
    White T. Hadoop:The Definitive Guide (3rd edition). O'Reilly Media, California, U.S., 2012.
  • Related Articles

    [1]Dan Zhao, Su-Yun Zhao, Hong Chen, Rui-Xuan Liu, Cui-Ping Li, Xiao-Ying Zhang. Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy[J]. Journal of Computer Science and Technology, 2023, 38(6): 1403-1422. DOI: 10.1007/s11390-023-1346-7
    [2]Jia-Qing Dong, Ze-Hao He, Yuan-Yuan Gong, Pei-Wen Yu, Chen Tian, Wan-Chun Dou, Gui-Hai Chen, Nai Xia, Hao-Ran Guan. SMART: Speedup Job Completion Time by Scheduling Reduce Tasks[J]. Journal of Computer Science and Technology, 2022, 37(4): 763-778. DOI: 10.1007/s11390-022-2118-5
    [3]Shi-Ming Guo, Hong Gao. HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases[J]. Journal of Computer Science and Technology, 2016, 31(4): 776-786. DOI: 10.1007/s11390-016-1662-2
    [4]Yong-Xin Tong, Lei Chen, Jieying She. Mining Frequent Itemsets in Correlated Uncertain Databases[J]. Journal of Computer Science and Technology, 2015, 30(4): 696-712. DOI: 10.1007/s11390-015-1555-9
    [5]Yu-Xiang Wang, Jun-Zhou Luo, Ai-Bo Song, Fang Dong. Partition-Based Online Aggregation with Shared Sampling in the Cloud[J]. Journal of Computer Science and Technology, 2013, 28(6): 989-1011. DOI: 10.1007/s11390-013-1393-6
    [6]Ying-Jie Shi, Xiao-Feng Meng, Fusheng Wang, Yan-Tao Gan. HEDC++:An Extended Histogram Estimator for Data in the Cloud[J]. Journal of Computer Science and Technology, 2013, 28(6): 973-988. DOI: 10.1007/s11390-013-1392-7
    [7]Jian-Hua Feng, Qian Qian, Jian-Yong Wang, Li-Zhu Zhou. Efficient Mining of Frequent Closed XML Query Pattern[J]. Journal of Computer Science and Technology, 2007, 22(5): 725-735.
    [8]TAO Xiaopeng, ZHOU Aoying, HU Yunfa. Fast Algorithms of Mining Probability Functional Dependency Rules in Relational Database[J]. Journal of Computer Science and Technology, 2000, 15(3): 261-270.
    [9]ZHOU Aoying, JIN Wen, ZHOU Shuigeng, QIAN Weining, TIAN Zenping. Incremental Mining of the Schema of Semistructured Data[J]. Journal of Computer Science and Technology, 2000, 15(3): 241-248.
    [10]Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge Discovery[J]. Journal of Computer Science and Technology, 1998, 13(4): 348-368.

Catalog

    Article views (51) PDF downloads (998) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return