›› 2009, Vol. 24 ›› Issue (6): 1000-1009.

• Special Section on International Partnership Programs Supported by CAS • Previous Articles     Next Articles

Customer Activity Sequence Classification for Debt Prevention in Social Security

Huaifeng Zhang1 (张淮风), Member, IEEE, Yanchang Zhao2, Member, IEEE, Longbing Cao2 (操龙兵), Senior Member, IEEE, Chengqi Zhang2 (张成奇), Senior Member, IEEE, and Hans Bohlscheid1   

  1. 1Payment Reviews Branch, Business Integrity Division, Centrelink, Canberra, Australia
    2Centre for Quantum Computation and Intelligent Systems (QCIS), University of Technology, Sydney, Australia
  • Received:2009-02-28 Revised:2009-07-26 Online:2009-11-05 Published:2009-11-05
  • About author:
    Huaifeng Zhang is a senior Data Mining specialist in Data Mining Section within Centrelink, Australia. Dr. Zhang was awarded the Ph.D. degree from Chinese Academy of Sciences (CAS) in 2004. He has more than 40 publications in the previous five years, including one book published by Springer, eight articles in journals, three chapters in edited books. His research interests include combined pattern mining, sequence classification, behaviour analysis and modeling, etc.
    Yanchang Zhao is a postdoctoral research fellow in Data Sciences & Knowledge Discovery Research Lab, Faculty of Engineering & IT, University of Technology, Sydney, Australia. His research interests are association rules, sequential patterns, clustering and post-mining. He has published more than 30 papers on the above topics, including six journal articles, one edited book and three book chapters. He has served as chair of two international workshops, program committee member for 14 international conferences and reviewer for 9 international journals and over a dozen of other international conferences.
    Longbing Cao is an associate professor in Faculty of Engineering & IT, University of Technology, Sydney, Australia. He is the director of Data Sciences & Knowledge Discovery Research Lab. His research interest focuses on domain driven data mining, multi-agents, and the integration of agent and data mining. He is a chief investigator of three ARC (Australian Research Council) Discovery projects and two ARC Linkage projects. He has over 50 publications, including one monograph, two edited books and 10 journal articles. He is a program co-chair of 11 international conferences.
    Chengqi Zhang is a research professor in Faculty of Engineering & IT, University of Technology, Sydney, Australia. He is the director of UTS Research Centre for Quantum Computation and Intelligent Systems and a chief investigator in Data Mining Program for Australian Capital Markets on Cooperative Research Centre. He has been a chief investigator of eight research projects. His research interests include data mining and multi-agent systems. He is a co-author of three monographs, a co-editor of nine books, and an author or co-author of more than 150 research papers. He is the chair of the ACS (Australian Computer Society) National Committee for Artificial Intelligence and Expert Systems, a chair/member of the steering committee for three international conferences.
    Hans Bohlscheid is an executive in the Australian Public Service, Hans' present role as business manager for the data mining was preceded by a long career in education where he held a number of teaching and principal positions. For the last four years he has been responsible for the development and implementation of Commonwealth Budget initiatives based on changes to legislation and policy. During this period he has managed a considerable suite of projects, however it is his recent involvement in a pilot which sought to determine the effectiveness of data mining as a predictive and debt prevention tool, that has shifted his focus to research and analysis. In addition to his government responsibilities, Hans is currently managing a 3-year University of Technology Sydney research project which is funded through an Australian Research Council Linkage Grant in partnership with the Commonwealth. He is a partnership Investigator and industry advisor to the University's Data Sciences and Knowledge Discovery Laboratory, and he has co-authored a number of publications and book chapters relating to data mining. His personal research involves an examination of project management methodology for actionable knowledge delivery.
  • Supported by:

    This work is supported by Australian Research Council Linkage Project under Grant No. LP0775041 and the Early Career Researcher Grant under Grant No. 2007002448 from University of Technology, Sydney, Australia.

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

[1] Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Trans. Speech and Audio Signal Processing, May 1997, 5(3): 257–265.
[2] Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. Text classification using string kernels. Journal of Machine Learning Research, 2002, 2: 419–444.
[3] Baker L D, McCallum A K. Distributional clustering of words for text classification. In Proc. the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24–28, 1998, pp.96–103.
[4] Wu C, Berry M, Shivakumar S, McLarty J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, October, 1995, 21(1/2): 177–193.
[5] Chuzhanova N A, Jones A J, Margetts S. Feature selection for genetic sequence classification. Bioinformatics, 1998, 14(2): 139–143.
[6] She R, Chen F, Wang K, Ester M, Gardy J L, Brinkman F S L. Frequent-subsequence-based prediction of outer membrane proteins. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003), Washington DC, USA, August 24–27, 2003, pp.436–445.
[7] Sonnenburg S, R¨atsch G, Sch¨afer C. Learning interpretable SVMs for biological sequence classification. In Proc. Research in Computational Molecular Biology (RECOMB2005), Cambridge, USA, May 14–18, 2005, pp.389–407.
[8] Hakeem A, Sheikh Y, Shah M. CASEE: A hierarchical event representation for the analysis of videos. In Proc. the Nineteenth National Conference on Artificial Intelligence (AAAI2004), San Jose, USA., July 25–29, 2004, pp.263–268.
[9] Eichinger F, Nauck D D, Klawonn F. Sequence mining for customer behaviour predictions in telecommunications. In Proc. the Workshop on Practical Data Mining at ECML/PKDD, Berlin, Germany, September 18–22, 2006, pp.3–10.
[10] Centrelink Annual Report 2007-2008. Technical Report, Centrelink, 2008.
[11] Lesh N, Zaki M J, Ogihara M. Mining features for sequence classification. In Proc. the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, August 15–18, 1999, pp.342–346.
[12] Tseng V S M, Lee C-H. CBS: A new classification method by using sequential patterns. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.596–600.
[13] Xing Z, Pei J, Dong G, Yu P S. Mining sequence classifiers for early prediction. In Proc. SIAM International Conference on Data Mining (SDM2008), Atlanta, USA, April 24–26, 2008, pp.644–655.
[14] Exarchos T P, Tsipouras M G, Papaloukas C, Fotiadis D I. A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data & Knowledge Engineering, September 2008, 66(3): 467–487.
[15] Agrawal R, Srikant R. Mining sequential patterns. In Proc. the Eleventh IEEE International Conference on Data Engineering (ICDE 1995), Taipei, China, March 6–10, 1995, pp.3– 14.
[16] Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th IEEE International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany, April 2–6, 2001, pp.215–224.
[17] Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23– 26, 2002, pp.429–435.
[18] Yan X, Han J, Afshar R. Clospan: Mining closed sequential patterns in large datasets. In Proc. SIAM International Conference on Data Mining (SDM2003), San Francisco, USA, May 1–3, 2003, pp.166–177.
[19] Zaki M J. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31–60.
[20] Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In Proc. the 4th ACM International Conference on Knowledge Discovery and Data Mining (KDD1998), Menlo Park, USA, August 27–31, 1998, pp.80–86.
[21] Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. the First IEEE International Conference on Data Mining (ICDM2001), Los Alamitos, USA, Nov. 29–Dec.2, 2001, pp.369–376.
[22] Cheng H, Yan X, Han J, Hsu C-W. Discriminative frequent pattern analysis for effective classification. In Proc. 23rd IEEE International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 17–20, 2007, pp.716– 725.
[23] Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, August 2007, 15(1): 55–86.
[24] Verhein F, Chawla S. Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proc. the Seventh IEEE International Conference on Data Mining (ICDM2007), Omaha, USA, Oct. 28–31, 2007, pp.679–684.
[25] Antonie M L, Zaiane O R, Holte R C. Learning to use a learned model: A two-stage approach to classification. In Proc. the Sixth International Conference on Data Mining (ICDM2006), Hong Kong, China, Dec. 18–22, 2006, pp.33– 42.
[26] Baralis E, Garza P. A lazy approach to pruning classification rules. In Proc. the Second IEEE International Conference on Data Mining (ICDM2002), Maebashi City, Japan, Dec. 9–12, 2002, pp.35–42.
[27] Wang J, Karypis G. Harmony: Efficiently mining the best rules for classification. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.205–216.
[28] Cheng H, Yan X, Han J, Yu P S. Direct discriminative pattern mining for effective classification. In Proc. the 24th IEEE International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7–12, 2008, pp.169–178.
[29] Tan P-N, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.32–41.

No related articles found!
Full text



No Suggested Reading articles found!

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved