We use cookies to improve your experience with our site.
Geun Yong Kim, Joon-Young Paik, Yeongcheol Kim, Eun-Sun Cho. Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis[J]. Journal of Computer Science and Technology, 2022, 37(2): 423-442. DOI: 10.1007/s11390-021-0263-x
Citation: Geun Yong Kim, Joon-Young Paik, Yeongcheol Kim, Eun-Sun Cho. Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis[J]. Journal of Computer Science and Technology, 2022, 37(2): 423-442. DOI: 10.1007/s11390-021-0263-x

Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis

  • File entropy is one of the major indicators of crypto-ransomware because the encryption by ransomware increases the randomness of file contents. However, entropy-based ransomware detection has certain limitations; for example, when distinguishing ransomware-encrypted files from normal files with inherently high-level entropy, misclassification is very possible. In addition, the entropy evaluation cost for an entire file renders entropy-based detection impractical for large files. In this paper, we propose two indicators based on byte frequency for use in ransomware detection; these are termed EntropySA and DistSA, and both consider the interesting characteristics of certain file subareas termed "sample areas'' (SAs). For an encrypted file, both the sampled area and the whole file exhibit high-level randomness, but for a plain file, the sampled area embeds informative structures such as a file header and thus exhibits relatively low-level randomness even though the entire file exhibits high-level randomness. EntropySA and DistSA use "byte frequency" and a variation of byte frequency, respectively, derived from sampled areas. Both indicators cause less overhead than other entropy-based detection methods, as experimentally proven using realistic ransomware samples. To evaluate the effectiveness and feasibility of our indicators, we also employ three expensive but elaborate classification models (neural network, support vector machine and threshold-based approaches). Using these models, our experimental indicators yielded an average F1-measure of 0.994 and an average detection rate of 99.46% for file encryption attacks by realistic ransomware samples.

  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return