Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (2): 423-442.doi: 10.1007/s11390-021-0263-x

Special Issue: Software Systems; Theory and Algorithms

• Theory and Algorithms • Previous Articles     Next Articles

Byte Frequency Based Indicators for Crypto-Ransomware Detection from Empirical Analysis

Geun Yong Kim1, Joon-Young Paik2,*, Yeongcheol Kim1, and Eun-Sun Cho1, Member, ACM, IEEE        

  1. 1Department of Computer Science and Engineering, Chungnam National University, Daejeon 34134, South Korea
    2School of Computer Science and Technology, Tiangong University, Tianjin 300387, China
  • Received:2020-01-02 Revised:2021-06-06 Accepted:2021-07-21 Online:2022-03-31 Published:2022-03-31
  • Contact: Joon-Young Paik
  • About author:Joon-Young Paik received his B.S., M.S., and Ph.D. degrees in computer science and engineering from Chungnam National University, Daejeon, in 2008, 2010, and 2013, respectively. He is currently an associate professor at the School of Computer Science and Technology, Tiangong University, Tianjin. His current research interests are malware detection and storage security.
  • Supported by:
    This work was supported in part by the National Natural Science Foundation of China under Grant No. 61806142, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC44000, the Tianjin Science and Technology Program under Grant No. 19PTZWHZ00020, and the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (Training Key Talents in Industrial Convergence Security) under Grant No. 2019-0-01343.

File entropy is one of the major indicators of crypto-ransomware because the encryption by ransomware increases the randomness of file contents. However, entropy-based ransomware detection has certain limitations; for example, when distinguishing ransomware-encrypted files from normal files with inherently high-level entropy, misclassification is very possible. In addition, the entropy evaluation cost for an entire file renders entropy-based detection impractical for large files. In this paper, we propose two indicators based on byte frequency for use in ransomware detection; these are termed EntropySA and DistSA, and both consider the interesting characteristics of certain file subareas termed "sample areas'' (SAs). For an encrypted file, both the sampled area and the whole file exhibit high-level randomness, but for a plain file, the sampled area embeds informative structures such as a file header and thus exhibits relatively low-level randomness even though the entire file exhibits high-level randomness. EntropySA and DistSA use "byte frequency" and a variation of byte frequency, respectively, derived from sampled areas. Both indicators cause less overhead than other entropy-based detection methods, as experimentally proven using realistic ransomware samples. To evaluate the effectiveness and feasibility of our indicators, we also employ three expensive but elaborate classification models (neural network, support vector machine and threshold-based approaches). Using these models, our experimental indicators yielded an average F1-measure of 0.994 and an average detection rate of 99.46% for file encryption attacks by realistic ransomware samples.

Key words: computer security; cryptography; machine learning; ransomware ;

[1] Young A, Yung M. Cryptovirology: Extortion-based security threats and counter-measures. In Proc. the 17th IEEE Symp. Security and Privacy, May 1996, pp.129-140. DOI: 10.1109/SECPRI.1996.502676.
[2] Daemen J, Rijmen V. The Design of Rijndael: AES---The Advanced Encryption Standard. Springer, 2002. DOI: 10.1007/978-3-662-04722-4.
[3] Rivest R L, Shamir A, Adleman L. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 1978, 21(2): 120-126. DOI: 10.1145/359340.359342.
[4] McCoy D, Bauer K, Grunwald D, Kohno T, Sicker D. Shining light in dark places: Understanding the Tor network. In Proc. the 8th Conf. Privacy Enhancing Technologies, Jul. 2008, pp.63-76. DOI: 10.1007/978-3-540-70630-4_5.
[5] Reid F, Harrigan M. An analysis of anonymity in the Bitcoin system. In Proc. the 3rd IEEE International Conf. Privacy, Security, Risk and Trust and the 3rd IEEE International Conf. Social Computing, Oct. 2011, pp.1318-1326. DOI: 10.1109/PASSAT/SocialCom.2011.79.
[6] Kelpsas B, Nelson A. Ransomware in hospitals: What providers will inevitably face when attacked. The Journal of Medical Practice Management, 2016, 32(1): 67-70.
[7] Cyber Threat Alliance. CryptoWall version 3 threat. Technical Report, Infopoint Security, 2019., April 2021.
[8] Sophos. SophosLabs 2019 threat report. Technical Report, Sophos, 2019., May 2021.
[9] Sophos. Ransomware as a service (RaaS): Deconstructing Philadelphia. Technical Report, Sophos, 2017., May 2021.
[10] Scaife N, Carter H, Traynor P, Butler K R B. CryptoLock (and drop it): Stopping ransomware attacks on user data. In Proc. the 36th IEEE International Conf. Distributed Computing Systems, Jun. 2016, pp.303-312. DOI: 10.1109/ICDCS.2016.46.
[11] Kharaz A, Arshad S, Mulliner C, Robertson W, Kirda E. UNVEIL: A large-scale, automated approach to detecting ransomware. In Proc. the 25th USENIX Security Symp., Aug. 2016, pp.757-772.
[12] Continella A, Guagnelli A, Zingaro G, De Pasquale G, Barenghi A, Zanero S, Maggi F. ShieldFS: A self-healing, ransomware-aware filesystem. In Proc. the 32nd Annual Conf. Computer Security Applications, Dec. 2016, pp.336-347. DOI: 10.1145/2991079.2991110.
[13] Shukla M, Mondal S, Lodha S. POSTER: Locally virtualized environment for mitigating ransomware threat. In Proc. the 2016 ACM SIGSAC Conf. Computer and Communications Security, Oct. 2016, pp.1784-1786. DOI: 10.1145/2976749.2989051.
[14] McDaniel M, Heydari M H. Content based file type detection algorithms. In Proc. the 36th Hawaii International Conf. System Sciences, Jan. 2003. DOI: 10.1109/HICSS.2003.1174905.
[15] Shannon C E. A mathematical theory of communication. Bell System Technical Journal, 1948, 27(3): 379-423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.
[16] Richman J S, Moorman J R. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology: Heart and Circulatory Physiology, 2000, 278(6): 2039-2049. DOI: 10.1152/ajpheart.2000.278.6.H2039.
[17] Humeau-Heurtier A. The multiscale entropy algorithm and its variants: A review. Entropy, 2015, 17(5): 3110-3123. DOI: 10.3390/e17053110.
[18] Ghaffari F, Abadi M. DroidMalHunter: A novel entropy-based anomaly detection system to detect malicious Android applications. In Proc. the 5th International Conf. Computer and Knowledge Engineering, Oct. 2015, pp.301-306. DOI: 10.1109/ICCKE.2015.7365846.
[19] Jones L. Constructive approximations for neural networks by sigmoidal functions. Proceedings of IEEE, 1990, 78(10): 1586-1589. DOI: 10.1109/5.58342.
[20] Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014., May 2021.
[21] Makhoul J, Kubala F, Schwartz R, Weischedel R. Performance measures for information extraction. In Proc. the DARPA Broadcast News Workshop, February 1999, pp.249-252.
[22] Dworkin M. Recommendation for block cipher modes of operation: Galois/Counter Mode (GCM) for confidentiality and authentication. Technical Report, National Institute of Standards and Technology, 2006.\textasciitilde rogaway/ocb/gcm.pdf, April 2021.
[23] Sahu M K, Ahirwar M, Hemlata A. A review of malware detection based on pattern matching technique. International Journal of Computer Science and Information Technologies, 2014, 5(1): 944-947.
[24] Sedgewick A, Souppaya M, Scarfone K. Guide to application whitelisting. Technical Report, National Institute of Standards and Technology, 2015., April 2021. DOI: 10.6028/NIST.SP.800-167.
[25] Prabhakaran V, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Analysis and evolution of journaling file systems. In Proc. the 2005 USENIX Annual Technical Conf., April 2005, pp.105-120.
[26] Virable M, Savage S, Voelker G M. BlueSky: A cloud-backed file system for the enterprise. In Proc. the 10th USENIX Conf. File and Storage Technologies, Feb. 2012, Article No. 19.
[27] Paik J Y, Shin K, Cho E S. Self-defensible storage devices based on flash memory against ransomware. In Proc. the 37th IEEE Symp. Security and Privacy, May 2016.
[28] Huang J, Xu J, Xing X, Liu P, Qureshi M K. FlashGuard: Leveraging intrinsic flash properties to defend against encryption ransomware. In Proc. the 2017 ACM SIGSAC Conf. Computer and Communications Security, Oct. 2017, pp.2231-2244. DOI: 10.1145/3133956.3134035.
[29] Kolodenker E, Koch W, Stringhini G, Egele M. PayBreak: Defense against crypto-graphic ransomware. In Proc. the 2017 ACM on Asia Conf. Computer and Communications Security, Apr. 2017, pp.599-611. DOI: 10.1145/3052973.3053035.
[30] Karresand M, Shahmehri N. File type identification of data fragments by their binary structure. In Proc. the 2006 IEEE Workshop on Information Assurance, Jun. 2006, pp.140-147. DOI: 10.1109/IAW.2006.1652088.
[31] Li Q, Ong A, Suganthan P, Thing V. A novel support vector machine approach to high entropy data fragment classification. In Proc. South African Information Security Multi-Conference, May 2010, pp.236-247.
[32] Lyda R, Hamrock J. Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy, 2007, 5(2): 40-45. DOI: 10.1109/MSP.2007.48.
[33] Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In Proc. the 10th International Conf. Malicious and Unwanted Software, Oct. 2015, pp.11-20. DOI: 10.1109/MALWARE.2015.7413680.
[34] Li B, Zhang Y, Yao J, Yin T. MDBA: Detecting malware based on bytes n-gram with association mining. In Proc. the 26th International Conf. Telecommunications, Apr. 2019, pp.227-232. DOI: 10.1109/ICT.2019.8798828.
[1] Jian-Zhe Zhao, Xing-Wei Wang, Ke-Ming Mao, Chen-Xi Huang, Yu-Kai Su, and Yu-Chen Li. Correlated Differential Privacy of Multiparty Data Release in Machine Learning [J]. Journal of Computer Science and Technology, 2022, 37(1): 231-251.
[2] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. Machine Learning Aided Key-Guessing Attack Paradigm Against Logic Block Encryption [J]. Journal of Computer Science and Technology, 2021, 36(5): 1102-1117.
[3] Jian-Wei Cui, Wei Lu, Xin Zhao, Xiao-Yong Du. Efficient Model Store and Reuse in an OLML Database System [J]. Journal of Computer Science and Technology, 2021, 36(4): 792-805.
[4] Sara Elmidaoui, Laila Cheikhi, Ali Idri, Alain Abran. Machine Learning Techniques for Software Maintainability Prediction: Accuracy Analysis [J]. Journal of Computer Science and Technology, 2020, 35(5): 1147-1174.
[5] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[6] Xiang-Jun Lu, Chi Zhang, Da-Wu Gu, Jun-Rong Liu, Qian Peng, Hai-Feng Zhang. Evaluating and Improving Linear Regression Based Profiling: On the Selection of Its Regularization [J]. Journal of Computer Science and Technology, 2020, 35(5): 1175-1197.
[7] Shu-Zheng Zhang, Zhen-Yu Zhao, Chao-Chao Feng, Lei Wang. A Machine Learning Framework with Feature Selection for Floorplan Acceleration in IC Physical Design [J]. Journal of Computer Science and Technology, 2020, 35(2): 468-474.
[8] Rui Ren, Jiechao Cheng, Xi-Wen He, Lei Wang, Jian-Feng Zhan, Wan-Ling Gao, Chun-Jie Luo. HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems [J]. Journal of Computer Science and Technology, 2019, 34(6): 1167-1184.
[9] Ge Wu, Jian-Chang Lai, Fu-Chun Guo, Willy Susilo, Fu-Tai Zhang. Tightly Secure Public-Key Cryptographic Schemes from One-More Assumptions [J]. Journal of Computer Science and Technology, 2019, 34(6): 1366-1379.
[10] João Fabrício Filho, Luis Gustavo Araujo Rodriguez, Anderson Faustino da Silva. Yet Another Intelligent Code-Generating System: A Flexible and Low-Cost Solution [J]. Journal of Computer Science and Technology, 2018, 33(5): 940-965.
[11] Ting-Ting Lin, Xue-Jia Lai, Wei-Jia Xue, Yin Jia. A New Feistel-Type White-Box Encryption Scheme [J]. , 2017, 32(2): 386-395.
[12] Lan Yao, Feng Zeng, Dong-Hui Li, Zhi-Gang Chen. Sparse Support Vector Machine with Lp Penalty for Feature Selection [J]. , 2017, 32(1): 68-77.
[13] Xin-Qi Bao, Yun-Fang Wu. A Tensor Neural Network with Layerwise Pretraining: Towards Effective Answer Retrieval [J]. , 2016, 31(6): 1151-1160.
[14] Najam Nazar, Yan Hu, He Jiang. Summarizing Software Artifacts: A Literature Review [J]. , 2016, 31(5): 883-909.
[15] Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks [J]. , 2016, 31(3): 489-500.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[7] Shi Zhongzhi;. Knowledge-Based Decision Support System[J]. , 1987, 2(1): 22 -29 .
[8] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] Xia Peisu; Fang Xinwo; Wang Yuxiang; Yan Kaiming; Zhang Tingjun; Liu Yulan; Zhao Chunying; Sun Jizhong;. Design of Array Processor Systems[J]. , 1987, 2(3): 163 -173 .
[10] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved