Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Daniel Kunkle; Donghui Zhang; Gene Cooperman

Daniel Kunkle, Donghui Zhang, Gene Cooperman. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J]. Journal of Computer Science and Technology, 2008, 23(1): 77-02.

Citation:

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Abstract

Abstract

This paper presents some new algorithms to efficiently mine maxfrequent generalized itemsets (g-itemsets) and essential generalizedassociation rules (g-rules). These are compact and generalrepresentations for all frequent patterns and all strong associationrules in the generalized environment. Our results fill an importantgap among algorithms for frequent patterns and association rules bycombining two concepts. First, generalized itemsets employ a taxonomyof items, rather than a flat list of items. This produces morenatural frequent itemsets and associations such as (\em meat, milk)instead of (beef, milk), (chicken, milk), etc. Second, compactrepresentations of frequent itemsets and strong rules, whose resultsize is exponentially smaller, can solve a standard dilemma in miningpatterns: with small threshold values for support and confidence, theuser is overwhelmed by the extraordinary number of identifiedpatterns and associations; but with large threshold values, someinteresting patterns and associations fail to be identified.Our algorithms can also expand those max frequent g-itemsets andessential g-rules into the much larger set of ordinary frequentg-itemsets and strong g-rules. While that expansion is notrecommended in most practical cases, we do so in order to present acomparison with existing algorithms that only handle ordinaryfrequent g-itemsets. In this case, the new algorithm is shown to bethousands, and in some cases millions, of the time faster than previousalgorithms. Further, the new algorithm succeeds in analyzing deepertaxonomies, with the depths of seven or more. Experimental results forprevious algorithms limited themselves to taxonomies with depth atmost three or four.In each of the two problems, a straightforward lattice-based approach isbriefly discussed and then a classification-based algorithm is developed. Inparticular, the two classification-based algorithms are MFGI\_class formining max frequent g-itemsets and EGR\_class for mining essentialg-rules. The classification-based algorithms are featured with conceptualclassification trees and dynamic generation and pruning algorithms.

FullText(HTML)

References (32)

Relative Articles

Supplements (0)

Cited By

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Abstract

Catalog

Export File

Citation

Format

Content