Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Cai-Yan Jia; Xie-Ping Gao

Cai-Yan Jia, Xie-Ping Gao. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules[J]. Journal of Computer Science and Technology, 2005, 20(3): 309-318.

Citation:

Cai-Yan Jia, Xie-Ping Gao. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules[J]. Journal of Computer Science and Technology, 2005, 20(3): 309-318.

Citation:

Cai-Yan Jia, Xie-Ping Gao. Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules[J]. Journal of Computer Science and Technology, 2005, 20(3): 309-318.

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Abstract

Abstract

One of the obstacles of the efficient association rule mining is theexplosive expansion of data sets since it is costly or impossible toscan large databases, esp., for multiple times. A popular solution toimprove the speed and scalability of the association rule mining is todo the algorithm on a random sample instead of the entire database. Buthow to effectively define and efficiently estimate the degree of errorwith respect to the outcome of the algorithm, and how to determine the samplesize needed are entangling researches until now. In this paper,an effective and efficient algorithm is given based on the PAC(Probably Approximate Correct) learning theory to measure and estimatesample error. Then, a new adaptive, on-line, fast samplingstrategy --- multi-scaling sampling --- is presented inspired by MRA(Multi-Resolution Analysis) and Shannon sampling theorem, for quicklyobtaining acceptably approximate association rules at appropriate samplesize. Both theoretical analysis and empirical study have showed that thesampling strategy can achieve a very good speed-accuracy trade-off.

FullText(HTML)

References (23)

Relative Articles

Supplements (0)

Cited By

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Abstract

Catalog

Export File

Citation

Format

Content