基于字节频率特征码的勒索病毒检测方法

doi:10.1007/s11390-021-0263-x

基于字节频率特征码的勒索病毒检测方法

Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis

摘要

摘要: 1、研究背景（Context）：加密操作通常能增加文件内容的熵值，因此，文件熵成为了判别特定文件是否受到勒索病毒攻击的主要特征。然而，基于熵特征码的传统勒索病毒检测方法有诸多有待改进之处。例如，传统方式无法有效区分加密文件与高熵正常文件，其对整个文件的熵值计算代价高昂，实际使用价值不高。
2、目的（Objective）：针对文件经过多处理器加密操作，以至原内容与多份加密内容之间熵鸿沟较大的问题，设计一种耗时确定的勒索加密检测方法。
3、方法（Method）：针对文件在正常状态与加密后的字节随机度进行实验分析，设计文件特定子域（sample areas，SAs）的选择方法；通过对特定子域的字节频率与字节分布偏差的计算，分别提出EntropySA与DistSA勒索加密检测方法。提出方法在保证有效性的情况下减少文件熵计算区域，从而大大降低检测耗时。
4、结果（Result & Findings）：本文在神经网络、SVM和基于阈值的分类模型上进行了方法有效性验证。数据集收集自真实应用场景，实验结果显示提出方法获得了平均F1-分数0.994、平均检测率99.46%，方法在多种文件长度设定下计算耗时均等。
5、结论（Conclusions）：本文提出了EntropySA与DistSA两种特征码计算方法，方法通过计算子域文件熵，实现对勒索病毒的实时检测，避免了对正常文件与加密文件进行耗时的直接比较。提出方法具有多项式最差时间复杂度，并且在常见五种勒索病毒攻击检测事件中，获得了99.6%的检测有效率。同时，本文讨论了如何在实际应用中融合提出特征码与已有方法的融合问题。

Abstract: File entropy is one of the major indicators of crypto-ransomware because the encryption by ransomware increases the randomness of file contents. However, entropy-based ransomware detection has certain limitations; for example, when distinguishing ransomware-encrypted files from normal files with inherently high-level entropy, misclassification is very possible. In addition, the entropy evaluation cost for an entire file renders entropy-based detection impractical for large files. In this paper, we propose two indicators based on byte frequency for use in ransomware detection; these are termed EntropySA and DistSA, and both consider the interesting characteristics of certain file subareas termed "sample areas'' (SAs). For an encrypted file, both the sampled area and the whole file exhibit high-level randomness, but for a plain file, the sampled area embeds informative structures such as a file header and thus exhibits relatively low-level randomness even though the entire file exhibits high-level randomness. EntropySA and DistSA use "byte frequency" and a variation of byte frequency, respectively, derived from sampled areas. Both indicators cause less overhead than other entropy-based detection methods, as experimentally proven using realistic ransomware samples. To evaluate the effectiveness and feasibility of our indicators, we also employ three expensive but elaborate classification models (neural network, support vector machine and threshold-based approaches). Using these models, our experimental indicators yielded an average F1-measure of 0.994 and an average detection rate of 99.46% for file encryption attacks by realistic ransomware samples.

HTML全文

参考文献()

施引文献

资源附件()