›› 2016, Vol. 31 ›› Issue (3): 561-576.doi: 10.1007/s11390-016-1647-1

Special Issue: Surveys; Artificial Intelligence and Pattern Recognition

• Data Management and Data Mining • Previous Articles     Next Articles

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Sumyea Helal   

  1. School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA5001, Australia
  • Received:2015-02-12 Revised:2016-03-19 Online:2016-05-05 Published:2016-05-05

Subgroup discovery is a data mining technique that discovers interesting associations among different variables with respect to a property of interest. Existing subgroup discovery methods employ different strategies for searching, pruning and ranking subgroups. It is very crucial to learn which features of a subgroup discovery algorithm should be considered for generating quality subgroups. In this regard, a number of reviews have been conducted on subgroup discovery. Although they provide a broad overview on some popular subgroup discovery methods, they employ few datasets and measures for subgroup evaluation. In the light of the existing measures, the subgroups cannot be appraised from all perspectives. Our work performs an extensive analysis on some popular subgroup discovery methods by using a wide range of datasets and by defining new measures for subgroup evaluation. The analysis result will help with understanding the major subgroup discovery methods, uncovering the gaps for further improvement and selecting the suitable category of algorithms for specific application domains.

[1] Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: Towards a unifying framework. In Proc. the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 1996, pp.82-88.

[2] Novak P K, Lavra? N, Webb G I. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. The Journal of Machine Learning Research, 2009, 10: 377-403.

[3] Gamberger D, Lavra? N, Krsta?i? G. Active subgroup mining: A case study in coronary heart disease risk group detection. Artificial Intelligence in Medicine, 2003, 28(1): 27-57.

[4] Gamberger D, Lavra? N. Supporting factors in descriptive analysis of brain ischaemia. In Proc. the 11th Conference on Artificial Intelligence in Medicine (AIME), Jul. 2007, pp.155-159.

[5] Gamberger D, Lavra? N, Krsta?i? A, Krsta?i? G. Clinical data analysis based on iterative subgroup discovery: Experiments in brain ischaemia data analysis. Applied Intelligence, 2007, 27(3): 205-217.

[6] KlösgenW. Applications and research problems of subgroup mining. In Proc. the 11th ISMIS, June 1999.

[7] Lavra? N, Cestnik B, Gamberger D, Flach P. Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning, 2004, 57(1/2): 115-143.

[8] Romero C, González P, Ventura S, del Jesus M J, Herrera F. Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data. Expert Systems with Applications: An International Journal, 2009, 36(2): 1632-1644.

[9] Klösgen W, May M. Spatial subgroup mining integrated in an object-relational spatial database. In Proc. the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Aug. 2002, pp.275-286.

[10] May M, Ragia L. Spatial subgroup discovery applied to the analysis of vegetation data. In Proc. the 4th Practical Aspects of Knowledge Management, Dec. 2002, pp.49-61.

[11] Gamberger D, Lavra? N. Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 2002, 17(1): 501-527.

[12] Kavšek B, Lavra? N, Jovanoski U. APRIORI-SD: Adapting association rule learning to subgroup discovery. In Proc. the 5th IDA, Aug. 2003, pp.230-241.

[13] Atzmueller M, Puppe F. SD-Map — A fast algorithm for exhaustive subgroup discovery. In Proc. the 10th European Conference on Principle and Practice of Knowledge Discovery in Databases (PKDD), Sept. 2006, pp.6-17.

[14] Leeuwen M, Knobbe A. Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 2012, 25(2): 208-242.

[15] del Jesus M J, González P, Herrera F, Mesonero M. Evolutionary fuzzy rule induction process for subgroup discovery: A case study in marketing. IEEE Trans. Fuzzy Systems, 2007, 15(4): 578-592.

[16] Herrera F, Carmona C J, González P, del Jesus M J. An overview on subgroup discovery: Foundations and applications. Knowledge Information System, 2011, 29(3): 495-525.

[17] KlösgenW. Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, Fayyad V M, Piatetsky-Shapiro G, Smyth P et al. (eds.), AAAI/WIT Press, 1996, pp.249-271.

[18] Wrobel S. An algorithm for multi-relational discovery of subgroups. In Proc. the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD), Jun. 1997, pp.78-87.

[19] Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86.

[20] Grosskreutz H, Rüping S, Wrobel S. Tight optimistic estimates for fast subgroup discovery. In Proc. the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Sept. 2008, pp.440-456.

[21] Boley M, Grosskreutz H. Non-redundant subgroup discovery using a closure system. In Proc. the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Sept. 2009, pp.179-194.

[22] Grosskreutz H, Rüping S. On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 2009, 19(2): 210-226.

[23] Lavra? N, Kavšek B, Flach P, Todorovski L. Subgroup discovery with CN2-SD. The Journal of Machine Learning Research, 2004, 5: 153-188.

[24] Atzmueller M, Puppe F, Buscher H P. Towards knowledgeintensive subgroup discovery. In Proc. the Lernen-Wissensentdeckung-Adaptivität-Fachgruppe Maschinelles Lernen, Oct. 2004, pp.111-117.

[25] Klösgen W, May M, Petch J. Mining census data for spatial effects on mortality. Intelligent Data Analysis, 2003, 7(6): 521-540.

[25] Clark P, Niblett T. The CN2 induction algorithm. Journal of Machine Learning, 1989, 3(4): 261-283.

[26] Lavra? N, Zelezný F, Flach P. RSD: Relational subgroup discovery through first-order feature construction. In Proc. the 12th International Conference on Inductive Logic Programming, Jul. 2002, pp.149-165.

[27] Jovanoski V, Lavra? N. Classification rule learning with APRIORI-C. In Proc. the 10th Portuguese Conference on Artificial Intelligence, Dec. 2001, pp.44-51.

[28] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proc. the ACM SIGMOD International Conference on Management of Data, May 2000, pp.1-12.

[29] Agrawal R, Srikant R. Fast algorithms for mining association. In Proc. the 20th VLDB, Sept. 1994, pp.487-499.

[30] del Jesus M J, González P, Herrera F. Multiobjective genetic algorithm for extracting subgroup discovery fuzzy rules. In Proc. IEEE Symp. Computational Intelligence in Multicriteria Decision Making, Apr. 2007, pp.50-57.

[31] Zitzler E, Laumanns M, Thiele L. SPEA2: Improving the strength Pareto evolutionary algorithm. In Proc. International Congress on Evolutionary Methods for Design Optimization and Control with Applications to Industrial Problems, Sept. 2001, pp.95-100.

[32] Carmona C J, González P, del Jesus M J, Herrera F. NMEEF-SD: Non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Systems, 2010, 18(5): 958-970.

[33] Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm NSGA-II. IEEE Trans. Evolutionary Computation, 2002, 6(2): 182-197.

[34] Luna J M, Romero J R, Romero C, Ventura S. On the use of genetic programming for mining comprehensible rules in subgroup discovery. IEEE Trans. Cybernatics, 2014, 44(12): 2329-2341.

[35] Gamberger D, Lavra? N. Generating actionable knowledge by expert-guided subgroup discovery. In Proc. the 6th European Conference on Principles of Data Mining and Knowledge Discovery, Aug. 2002, pp.163-175.

[36] Lavra? N. Subgroup discovery techniques and applications. In Proc. the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 2005, pp.2-14.

[37] Carmona C J, González P, del Jesus M J, Navío-Acosta M, Jiménez-Trevino L. Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Computing, 2011, 15(12): 2435–2448.

[38] Carmona C J, Ruiz-Rodado V, del Jesus M J, Weber A, Grootveld M, González P, Elizondo D. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Information Sciences, 2015, 298(C): 180-197.

[39] Gamberger D, Lavra? N. Avoiding data overfitting in scientific discovery: Experiments in functional genomics. In Proc. the 16th European Conference on Artificial Intelligence, Aug. 2004, pp.470-474.

[40] Mueller M, Rosales R, Steck H, Krishnan S, Rao B, Kramer S. Subgroup discovery for test selection: A novel approach and its application to breast cancer diagnosis. In Proc. the 8th Intelligent Data Analysis, Aug.31-Sept.2, 2009, pp.119-130.

[41] Trajkovski I, ?elezný F, Lavra? N, Tolar J. Learning relational descriptions of differentially expressed gene groups. IEEE Trans. Systems, Man, and Cybernetics, 2008, 38(1): 16-25.

[42] Trajkovski I, ?elezný F, Tolar J, Lavra? N. Relational subgroup discovery for descriptive analysis of microarray data. In Proc. the 2nd International Conference on Computational Life Sciences, Sept. 2006, pp.86-96.

[43] Schmidt J, Hapfelmeier A, Mueller M, Perneczky R, Kurz A, Drzezga A, Kramer S. Interpreting PET scans by structured patient data: A data mining case study in dementia research. Knowledge and Information Systems, 2010, 24(1): 149-170.

[45] Kavšek B, Lavra? N. Using subgroup discovery to analyze the UK traffic data. Advances in Methodology and Statistics, 2004, 1(1): 249-264.

[46] Kavšek B, Lavra? N, Bullas J C. Rule induction for subgroup discovery: A case study in miningUK traffic accident data. In Proc. International Multi-Conference on Information Society, Jan. 2002, pp.127-130.

[47] Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A I. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, Fayyad VM, Piatefsky-Shapiro G, Smyth P et al. (eds.), AAAI/MIT Press, 1996, pp.307-328.

[48] Lavra? N, Flach P, Zupan B. Rule evaluation measures: A unifying view. In Proc. the 9th International Workshop on Inductive Logic Programming (ILP), Jun. 1999, pp.174-185.

[49] Lichman M. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml, Mar. 2016.

[50] Kohavi R, Sommerfield D, Dougherty J. Data mining using MLC++: A machine learning library in C++. International Journal on Artificial Intelligence Tools, 1997, 6(4): 537-566.

[51] Demšar J, Curk T, Erjavec A, Gorup C, Ho?evar T, Milutinovi ? M, Mo?ina M, Polajnar M, Toplak M, Stari? A, Štajdohar M, Umek L, ?agar L, ?bontar J, ?itnik M, Zupan B. Orange: Data mining toolbox in python. Journal of Machine Learning Research, 2013, 14: 2349-2353.

[52] Atzmueller M, Lemmerich F. VIKAMINE — Open-source subgroup discovery, pattern mining, and analytics. In Proc. ECML PKDD, Sept. 2012, pp.842-845.

[53] Alcalá-Fdez J, Sánchez L, García S, del Jesus M J, Ventura S, Garrell J M, Otero J, Romero C, Bacardit J, Rivas V M, Fernández J C, Herrera F. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009, 13(3): 307-318.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved