计算机科学技术学报 ›› 2022,Vol. 37 ›› Issue (2): 423-442.doi: 10.1007/s11390-021-0263-x

所属专题: Software Systems Theory and Algorithms

• • 上一篇    下一篇

基于字节频率特征码的勒索病毒检测方法

  

  • 收稿日期:2020-01-02 修回日期:2021-06-06 接受日期:2021-07-21 出版日期:2022-03-31 发布日期:2022-03-31

Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis

Geun Yong Kim1, Joon-Young Paik2,*, Yeongcheol Kim1, and Eun-Sun Cho1, Member, ACM, IEEE        

  1. 1Department of Computer Science and Engineering, Chungnam National University, Daejeon 34134, South Korea
    2School of Computer Science and Technology, Tiangong University, Tianjin 300387, China
  • Received:2020-01-02 Revised:2021-06-06 Accepted:2021-07-21 Online:2022-03-31 Published:2022-03-31
  • Contact: Joon-Young Paik E-mail:pjy2018@tiangong.edu.cn
  • About author:Joon-Young Paik received his B.S., M.S., and Ph.D. degrees in computer science and engineering from Chungnam National University, Daejeon, in 2008, 2010, and 2013, respectively. He is currently an associate professor at the School of Computer Science and Technology, Tiangong University, Tianjin. His current research interests are malware detection and storage security.
  • Supported by:
    This work was supported in part by the National Natural Science Foundation of China under Grant No. 61806142, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC44000, the Tianjin Science and Technology Program under Grant No. 19PTZWHZ00020, and the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (Training Key Talents in Industrial Convergence Security) under Grant No. 2019-0-01343.

1、研究背景(Context):加密操作通常能增加文件内容的熵值,因此,文件熵成为了判别特定文件是否受到勒索病毒攻击的主要特征。然而,基于熵特征码的传统勒索病毒检测方法有诸多有待改进之处。例如,传统方式无法有效区分加密文件与高熵正常文件,其对整个文件的熵值计算代价高昂,实际使用价值不高。
2、目的(Objective):针对文件经过多处理器加密操作,以至原内容与多份加密内容之间熵鸿沟较大的问题,设计一种耗时确定的勒索加密检测方法。
3、方法(Method):针对文件在正常状态与加密后的字节随机度进行实验分析,设计文件特定子域(sample areas,SAs)的选择方法;通过对特定子域的字节频率与字节分布偏差的计算,分别提出EntropySA与DistSA勒索加密检测方法。提出方法在保证有效性的情况下减少文件熵计算区域,从而大大降低检测耗时。
4、结果(Result & Findings):本文在神经网络、SVM和基于阈值的分类模型上进行了方法有效性验证。数据集收集自真实应用场景,实验结果显示提出方法获得了平均F1-分数0.994、平均检测率99.46%,方法在多种文件长度设定下计算耗时均等。

5、结论(Conclusions):本文提出了EntropySA与DistSA两种特征码计算方法,方法通过计算子域文件熵,实现对勒索病毒的实时检测,避免了对正常文件与加密文件进行耗时的直接比较。提出方法具有多项式最差时间复杂度,并且在常见五种勒索病毒攻击检测事件中,获得了99.6%的检测有效率。同时,本文讨论了如何在实际应用中融合提出特征码与已有方法的融合问题。


关键词: 计算机安全, 加密, 机器学习, 勒索病毒

Abstract:

File entropy is one of the major indicators of crypto-ransomware because the encryption by ransomware increases the randomness of file contents. However, entropy-based ransomware detection has certain limitations; for example, when distinguishing ransomware-encrypted files from normal files with inherently high-level entropy, misclassification is very possible. In addition, the entropy evaluation cost for an entire file renders entropy-based detection impractical for large files. In this paper, we propose two indicators based on byte frequency for use in ransomware detection; these are termed EntropySA and DistSA, and both consider the interesting characteristics of certain file subareas termed "sample areas'' (SAs). For an encrypted file, both the sampled area and the whole file exhibit high-level randomness, but for a plain file, the sampled area embeds informative structures such as a file header and thus exhibits relatively low-level randomness even though the entire file exhibits high-level randomness. EntropySA and DistSA use "byte frequency" and a variation of byte frequency, respectively, derived from sampled areas. Both indicators cause less overhead than other entropy-based detection methods, as experimentally proven using realistic ransomware samples. To evaluate the effectiveness and feasibility of our indicators, we also employ three expensive but elaborate classification models (neural network, support vector machine and threshold-based approaches). Using these models, our experimental indicators yielded an average F1-measure of 0.994 and an average detection rate of 99.46% for file encryption attacks by realistic ransomware samples.


Key words: computer security, cryptography, machine learning, ransomware

[1] Young A, Yung M. Cryptovirology: Extortion-based security threats and counter-measures. In Proc. the 17th IEEE Symp. Security and Privacy, May 1996, pp.129-140. DOI: 10.1109/SECPRI.1996.502676.
[2] Daemen J, Rijmen V. The Design of Rijndael: AES---The Advanced Encryption Standard. Springer, 2002. DOI: 10.1007/978-3-662-04722-4.
[3] Rivest R L, Shamir A, Adleman L. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 1978, 21(2): 120-126. DOI: 10.1145/359340.359342.
[4] McCoy D, Bauer K, Grunwald D, Kohno T, Sicker D. Shining light in dark places: Understanding the Tor network. In Proc. the 8th Conf. Privacy Enhancing Technologies, Jul. 2008, pp.63-76. DOI: 10.1007/978-3-540-70630-4_5.
[5] Reid F, Harrigan M. An analysis of anonymity in the Bitcoin system. In Proc. the 3rd IEEE International Conf. Privacy, Security, Risk and Trust and the 3rd IEEE International Conf. Social Computing, Oct. 2011, pp.1318-1326. DOI: 10.1109/PASSAT/SocialCom.2011.79.
[6] Kelpsas B, Nelson A. Ransomware in hospitals: What providers will inevitably face when attacked. The Journal of Medical Practice Management, 2016, 32(1): 67-70.
[7] Cyber Threat Alliance. CryptoWall version 3 threat. Technical Report, Infopoint Security, 2019. https://www.infopoint-security.de/medien/cryptowall-report.pdf, April 2021.
[8] Sophos. SophosLabs 2019 threat report. Technical Report, Sophos, 2019. https://www.sophos.com/en-us/medialibrary/PDFs/technical-papers/sophoslabs-2019-threat-report.pdf, May 2021.
[9] Sophos. Ransomware as a service (RaaS): Deconstructing Philadelphia. Technical Report, Sophos, 2017. https://www.sophos.com/en-us/medialibrary/PDFs/technical-papers/RaaS-Philadelphia.pdf, May 2021.
[10] Scaife N, Carter H, Traynor P, Butler K R B. CryptoLock (and drop it): Stopping ransomware attacks on user data. In Proc. the 36th IEEE International Conf. Distributed Computing Systems, Jun. 2016, pp.303-312. DOI: 10.1109/ICDCS.2016.46.
[11] Kharaz A, Arshad S, Mulliner C, Robertson W, Kirda E. UNVEIL: A large-scale, automated approach to detecting ransomware. In Proc. the 25th USENIX Security Symp., Aug. 2016, pp.757-772.
[12] Continella A, Guagnelli A, Zingaro G, De Pasquale G, Barenghi A, Zanero S, Maggi F. ShieldFS: A self-healing, ransomware-aware filesystem. In Proc. the 32nd Annual Conf. Computer Security Applications, Dec. 2016, pp.336-347. DOI: 10.1145/2991079.2991110.
[13] Shukla M, Mondal S, Lodha S. POSTER: Locally virtualized environment for mitigating ransomware threat. In Proc. the 2016 ACM SIGSAC Conf. Computer and Communications Security, Oct. 2016, pp.1784-1786. DOI: 10.1145/2976749.2989051.
[14] McDaniel M, Heydari M H. Content based file type detection algorithms. In Proc. the 36th Hawaii International Conf. System Sciences, Jan. 2003. DOI: 10.1109/HICSS.2003.1174905.
[15] Shannon C E. A mathematical theory of communication. Bell System Technical Journal, 1948, 27(3): 379-423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.
[16] Richman J S, Moorman J R. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology: Heart and Circulatory Physiology, 2000, 278(6): 2039-2049. DOI: 10.1152/ajpheart.2000.278.6.H2039.
[17] Humeau-Heurtier A. The multiscale entropy algorithm and its variants: A review. Entropy, 2015, 17(5): 3110-3123. DOI: 10.3390/e17053110.
[18] Ghaffari F, Abadi M. DroidMalHunter: A novel entropy-based anomaly detection system to detect malicious Android applications. In Proc. the 5th International Conf. Computer and Knowledge Engineering, Oct. 2015, pp.301-306. DOI: 10.1109/ICCKE.2015.7365846.
[19] Jones L. Constructive approximations for neural networks by sigmoidal functions. Proceedings of IEEE, 1990, 78(10): 1586-1589. DOI: 10.1109/5.58342.
[20] Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. http://arxiv.org/abs/1412.6980, May 2021.
[21] Makhoul J, Kubala F, Schwartz R, Weischedel R. Performance measures for information extraction. In Proc. the DARPA Broadcast News Workshop, February 1999, pp.249-252.
[22] Dworkin M. Recommendation for block cipher modes of operation: Galois/Counter Mode (GCM) for confidentiality and authentication. Technical Report, National Institute of Standards and Technology, 2006. https://web.cs.ucdavis.edu/\textasciitilde rogaway/ocb/gcm.pdf, April 2021.
[23] Sahu M K, Ahirwar M, Hemlata A. A review of malware detection based on pattern matching technique. International Journal of Computer Science and Information Technologies, 2014, 5(1): 944-947.
[24] Sedgewick A, Souppaya M, Scarfone K. Guide to application whitelisting. Technical Report, National Institute of Standards and Technology, 2015. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-167.pdf, April 2021. DOI: 10.6028/NIST.SP.800-167.
[25] Prabhakaran V, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Analysis and evolution of journaling file systems. In Proc. the 2005 USENIX Annual Technical Conf., April 2005, pp.105-120.
[26] Virable M, Savage S, Voelker G M. BlueSky: A cloud-backed file system for the enterprise. In Proc. the 10th USENIX Conf. File and Storage Technologies, Feb. 2012, Article No. 19.
[27] Paik J Y, Shin K, Cho E S. Self-defensible storage devices based on flash memory against ransomware. In Proc. the 37th IEEE Symp. Security and Privacy, May 2016.
[28] Huang J, Xu J, Xing X, Liu P, Qureshi M K. FlashGuard: Leveraging intrinsic flash properties to defend against encryption ransomware. In Proc. the 2017 ACM SIGSAC Conf. Computer and Communications Security, Oct. 2017, pp.2231-2244. DOI: 10.1145/3133956.3134035.
[29] Kolodenker E, Koch W, Stringhini G, Egele M. PayBreak: Defense against crypto-graphic ransomware. In Proc. the 2017 ACM on Asia Conf. Computer and Communications Security, Apr. 2017, pp.599-611. DOI: 10.1145/3052973.3053035.
[30] Karresand M, Shahmehri N. File type identification of data fragments by their binary structure. In Proc. the 2006 IEEE Workshop on Information Assurance, Jun. 2006, pp.140-147. DOI: 10.1109/IAW.2006.1652088.
[31] Li Q, Ong A, Suganthan P, Thing V. A novel support vector machine approach to high entropy data fragment classification. In Proc. South African Information Security Multi-Conference, May 2010, pp.236-247.
[32] Lyda R, Hamrock J. Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy, 2007, 5(2): 40-45. DOI: 10.1109/MSP.2007.48.
[33] Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In Proc. the 10th International Conf. Malicious and Unwanted Software, Oct. 2015, pp.11-20. DOI: 10.1109/MALWARE.2015.7413680.
[34] Li B, Zhang Y, Yao J, Yin T. MDBA: Detecting malware based on bytes n-gram with association mining. In Proc. the 26th International Conf. Telecommunications, Apr. 2019, pp.227-232. DOI: 10.1109/ICT.2019.8798828.
[1] 曹荣禹、曹逸轩、周干斌、罗平. 从长文档中提取深度可变的文档逻辑结构:方法、评估和应用[J]. 计算机科学技术学报, 2022, 37(3): 699-718.
[2] 赵建喆, 王兴伟, 毛克明, 黄辰希, 苏昱恺, 李宇宸. 机器学习中基于相关差分隐私保护的多方数据发布方法[J]. 计算机科学技术学报, 2022, 37(1): 231-251.
[3] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. 机器学习辅助的抗逻辑块加密密钥猜测攻击范式[J]. 计算机科学技术学报, 2021, 36(5): 1102-1117.
[4] Yuan Li, Xing-Chen Wang, Lin Huang, Yun-Lei Zhao. 揭序加密:文件注入攻击和前向安全[J]. 计算机科学技术学报, 2021, 36(4): 877-895.
[5] Yan-Hong Fan, Mei-Qin Wang, Yan-Bin Li, Kai Hu, Mu-Zhou Li. 一种抗SCPA和DOS攻击的高安全性的固件升级方案[J]. 计算机科学技术学报, 2021, 36(2): 419-433.
[6] Sara Elmidaoui, Laila Cheikhi, Ali Idri, Alain Abran. 用于软件可维护性预测的机器学习技术:精度分析[J]. 计算机科学技术学报, 2020, 35(5): 1147-1174.
[7] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. 用于老年人面部表情识别的深度学习模型和传统机器学习方法的对比研究[J]. 计算机科学技术学报, 2020, 35(5): 1127-1146.
[8] Shu-Zheng Zhang, Zhen-Yu Zhao, Chao-Chao Feng, Lei Wang. 基于的特征选择的用于加速芯片物理设计Floorplan的机器学习框架[J]. 计算机科学技术学报, 2020, 35(2): 468-474.
[9] Rui Ren, Jiechao Cheng, Xi-Wen He, Lei Wang, Jian-Feng Zhan, Wan-Ling Gao, Chun-Jie Luo. HybridTune:基于时空数据关联的大数据系统性能诊断[J]. 计算机科学技术学报, 2019, 34(6): 1167-1184.
[10] Chi Zhang, Jun-Rong Liu, Da-Wu Gu, Wei-Jia Wang, Xiang-Jun Lu, Zheng Guo, Hai-Ning Lu. 针对CDMA蜂窝网络认证协议的侧信道分析[J]. 计算机科学技术学报, 2019, 34(5): 1079-1095.
[11] Fateh Boucenna, Omar Nouali, Samir Kechid, M. Tahar Kechadi. 用户访问权限管理加密云数据的安全反向索引搜索[J]. 计算机科学技术学报, 2019, 34(1): 133-154.
[12] Ping Zhang, Hong-Gang Hu. 推广的可调Even-Mansour密码及其应用[J]. 计算机科学技术学报, 2018, 33(6): 1261-1277.
[13] Qi-Qi Lai, Bo Yang, Yong Yu, Zhe Xia, Yan-Wei Zhou, Yuan Chen. 基于格的可更新基于身份哈希证明系统及其对抗泄漏公钥加密方案的应用[J]. 计算机科学技术学报, 2018, 33(6): 1243-1260.
[14] João Fabrício Filho, Luis Gustavo Araujo Rodriguez, Anderson Faustino da Silva. 另一种智能代码生成系统:一种灵活低成本解决方案[J]. 计算机科学技术学报, 2018, 33(5): 940-965.
[15] Peng Jiang, Yi Mu, Fuchun Guo, Qiao-Yan Wen. 数据库系统中抗内部攻击的关键字搜索机制[J]. , 2017, 32(3): 599-617.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[7] 史忠植;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[8] 唐同诰; 招兆铿;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] 夏培肃; 方信我; 王玉祥; 严开明; 张廷军; 刘玉兰; 赵春英; 孙继忠;. Design of Array Processor Systems[J]. , 1987, 2(3): 163 -173 .
[10] 孙永强; 陆汝占; 黄小戎;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: