We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Shi-Ming Guo, Hong Gao. HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases[J]. Journal of Computer Science and Technology, 2016, 31(4): 776-786. DOI: 10.1007/s11390-016-1662-2
Citation: Shi-Ming Guo, Hong Gao. HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases[J]. Journal of Computer Science and Technology, 2016, 31(4): 776-786. DOI: 10.1007/s11390-016-1662-2

HUITWU: An Efficient Algorithm for High-Utility Itemset Mining in Transaction Databases

Funds: This work is partly supported by the National Basic Research 973 Program of China under Grant No. 2012CB316200 and the National Natural Science Foundation of China under Grant Nos. 61190115 and 61173022.
More Information
  • Author Bio:

    Shi-Ming Guo received his B.S. degree in software engineering from Harbin Institute of Technology, Harbin, in 2004. He is currently a Ph.D. candidate in the School of Computer Science and Technology at Harbin Institute of Technology, Harbin. His current research interests include high-utility pattern mining and massive data management.

  • Received Date: February 28, 2016
  • Revised Date: June 05, 2016
  • Published Date: July 04, 2016
  • Mining high-utility itemsets (HUIs) from a transaction database refers to the discovery of itemsets with high utilities like profits. Most of existing studies discover HUIs from a transaction database in two phases. In phase 1, different overestimation methods are applied to calculate the upper bounds of the utilities of itemsets. Since the overestimated utilities of itemsets are adopted, the itemsets whose overestimated utilities are no less than a user-specified threshold are selected as candidate HUIs, and they are verified by scanning the database one more time in phase 2. However, a large number of candidate HUIs incur two problems:1) it requires excessive memory to store these candidates; 2) it needs a large amount of running time to calculate their exact utilities. Vertical data format has been applied to mine HUIs recently. However this kind of method cannot deal with transactions with the same items effectively so that the size of database cannot be reduced sufficiently. The overall performance of algorithms is degraded consequently. Thus an algorithm HUITWU is proposed in this paper for mining HUIs. A novel data structure HUITWU-Tree is adopted to efficiently calculate the utilities of itemsets in a database. Extensive studies with both sparse and dense datasets have demonstrated that our proposed algorithm is more than an order of magnitude faster and consumes less memory than the state-of-the-art algorithms.
  • [1]
    Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In Proc. the 4th SIAM Int. Conf. Data Min., April 2004, pp.482-486.
    [2]
    Ahmed C F, Tanbeer S K, Jeong B S et al. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng., 2009, 21(12):1708-1721.
    [3]
    Tseng V A, Shie B E, Wu C W, Yu P S. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng., 2013, 25(8):1772-1786.
    [4]
    Pei J, Han J, Lakshmanan L V S. Mining frequent itemsets with convertible constraints. In Proc. the 17th Int. Conf. Data Eng., April 2001, pp.433-442.
    [5]
    Li Y X, Yeh J S, Chang C C. Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng., 2008, 64(1):198-217.
    [6]
    Liu Y, Liao W, Choudhary A. A two-phase algorithm for fast discovery of high utility of itemsets. In Proc. the 9th Pacific-Asia Conf. Knowl. Discov. Data Min., May 2005, pp.689-695.
    [7]
    Liu M, Qu J. Mining high utility itemsets without candidate generation. In Proc. the 21st ACM Int. Conf. Inf. Knowl. Man., October 29-November 2, 2012, pp.55-64.
    [8]
    Fournier-Viger P, Wu C W, Zida S, Tseng V S. FHM:Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proc. the 21th Int. Symp. Meth. Intel. Sys., June 2014, pp.83-92.
    [9]
    Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation:A frequent-pattern tree approach. Data Min. Knowl. Discov., 2004, 8(1):53-87.
    [10]
    Wang J, Han J, Pei J. CLOSET+:Searching for the best strategies for mining frequent closed itemsets. In Proc. the 9th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., August 2003, pp.236-245.
  • Related Articles

    [1]Guochen Cai, Kyungmi Lee, Ickjai Lee. Mining Semantic Trajectory Patterns from Geo-Tagged Data[J]. Journal of Computer Science and Technology, 2018, 33(4): 849-862. DOI: 10.1007/s11390-018-1860-1
    [2]Yu-Geng Song, Hui-Min Cui, Xiao-Bing Feng. Parallel Incremental Frequent Itemset Mining for Large Data[J]. Journal of Computer Science and Technology, 2017, 32(2): 368-385. DOI: 10.1007/s11390-017-1726-y
    [3]Jaehoon Choi, Jaewoo Kang, Jinseung Lee, Chihwan Song, Qingsong Jin, Sunwon Lee, Jinsun Uh. Mining Botnets and Their Evolution Patterns[J]. Journal of Computer Science and Technology, 2013, 28(4): 605-615. DOI: 10.1007/s11390-013-1361-1
    [4]Jian-Hua Feng, Qian Qian, Jian-Yong Wang, Li-Zhu Zhou. Efficient Mining of Frequent Closed XML Query Pattern[J]. Journal of Computer Science and Technology, 2007, 22(5): 725-735.
    [5]Jia-Wei Han, Jian Pei, Xi-Feng Yan. From Sequential Pattern Mining to Structured Pattern Mining: APattern-Growth Approach[J]. Journal of Computer Science and Technology, 2004, 19(3).
    [6]CHEN Ning, CHEN An, ZHOU Longxiang, LIU Lu. A Fast Algorithm for Mining Sequential Patterns from Large Databases[J]. Journal of Computer Science and Technology, 2001, 16(4).
    [7]MENG Bo, GUO Lei, CHEN Shifu. DPAL: Deductive Language for Embroidery Pattern Assembling[J]. Journal of Computer Science and Technology, 2000, 15(6): 625-628.
    [8]ZHOU Aoying, JIN Wen, ZHOU Shuigeng, QIAN Weining, TIAN Zenping. Incremental Mining of the Schema of Semistructured Data[J]. Journal of Computer Science and Technology, 2000, 15(3): 241-248.
    [9]Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge Discovery[J]. Journal of Computer Science and Technology, 1998, 13(4): 348-368.
    [10]Min Yinghua, Han Zhide. A Built-in Test Pattern Generator[J]. Journal of Computer Science and Technology, 1986, 1(4): 62-74.

Catalog

    Article views (28) PDF downloads (1028) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return