From Sequential Pattern Mining to Structured Pattern Mining: APattern-Growth Approach
-
Abstract
Sequential pattern mining is an important data mining problem withbroad applications. However, it is also a challenging problemsince the mining may have to generate or examine a combinatoriallyexplosive number of intermediate subsequences. Recent studieshave developed two major classes of sequential pattern miningmethods: (1) a candidate generation-and-test approach,represented by (I) GSP, a horizontal format-basedsequential pattern mining method, and (ii) SPADE,a vertical format-based method; and (2) a pattern-growthmethod, represented by PrefixSpan and its furtherextensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction andpresentation of the pattern-growth methodology and study itsprinciples and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, forefficient sequential pattern mining. Then we introduce gSpan formining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Severalextensions of these methods are also discussed in the paper, includingmining multi-level, multi-dimensional patterns and miningconstraint-based patterns.
-
-