We use cookies to improve your experience with our site.
Matti Pöllä , Timo Honkela  . Negative Selection of Written Language Using Character Multiset Statistics[J]. Journal of Computer Science and Technology, 2010, 25(6): 1256-1266. DOI: 10.1007/s11390-010-1099-y
Citation: Matti Pöllä , Timo Honkela  . Negative Selection of Written Language Using Character Multiset Statistics[J]. Journal of Computer Science and Technology, 2010, 25(6): 1256-1266. DOI: 10.1007/s11390-010-1099-y

Negative Selection of Written Language Using Character Multiset Statistics

  • We study the combination of symbol frequence analysis and negative selection for anomaly detection of discrete sequences where conventional negative selection algorithms are not practical due to data sparsity. Theoretical analysis on ergodic Markov chains is used to outline the properties of the presented anomaly detection algorithm and to predict the probability of successful detection. Simulations are used to evaluate the detection sensitivity and the resolution of the analysis on both generated artificial data and real-world language data including the English Wikipedia. Simulation results on large reference corpora are used to study the effects of the assumptions made in the theoretical model in comparison to real-world data.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return