On Locating Malicious Code in Piggybacked Android Apps
Li Li1, Daoyuan Li1, Tegawendé F. Bissyandé1, Jacques Klein1, Haipeng Cai2, Member, ACM, IEEE, David Lo3, Yves Le Traon1
1 Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg 2721, Luxembourg;
2 School of Electrical Engineering and Computer Science, Washington State University, Washington, WA 99163, U.S.A.;
3 School of Information Systems, Singapore Management University, Singapore 178902, Singapore
Abstract To devise efficient approaches and tools for detecting malicious packages in the Android ecosystem, researchers are increasingly required to have a deep understanding of malware. There is thus a need to provide a framework for dissecting malware and locating malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. Towards addressing this need, we propose in this work a tool-based approach called HookRanker, which provides ranked lists of potentially malicious packages based on the way malware behaviour code is triggered. With experiments on a ground truth of piggybacked apps, we are able to automatically locate the malicious packages from piggybacked Android apps with an accuracy@5 of 83.6% for such packages that are triggered through method invocations and an accuracy@5 of 82.2% for such packages that are triggered independently.
This work was supported by the Fonds National de la Recherche (FNR), Luxembourg under projects AndroMap C13/IS/5921289 and Recommend C15/IS/10449467.
Corresponding Authors: 10.1007/s11390-017-1786-z
About author: Li Li is a research associate at Interdisciplinary Center for Security,Reliability and Trust (SnT),University of Luxembourg,Luxembourg,and a honorary research associate at the CREST group,University College London,London.
Cite this article:
Li Li, Daoyuan Li, Tegawendé F. Bissyandé, Jacques Klein, Haipeng Cai, David Lo, Yves Le Traon.On Locating Malicious Code in Piggybacked Android Apps[J] Journal of Computer Science and Technology, 2017,V32(6): 1108-1124
 Demme J, Maycock M, Schmitz J, Tang A, Waksman A, Sethumadhavan S, Stolfo S. On the feasibility of online malware detection with performance counters. In Proc. the 40th Annual Int. Symp. Computer Architecture, June 2013, pp.559-570. Yerima S Y, Sezer S, McWilliams G, Muttik I. A new Android malware detection approach using Bayesian classification. In Proc. the 27th IEEE Int. Conf. Advanced Information Networking and Applications (AINA), March 2013, pp.121-128. Canfora G, Mercaldo F, Visaggio C A. A classifier of malicious Android applications. In Proc. the 8th Int. Conf. Availability, Reliability and Security (ARES), September 2013, pp.607-614. Sahs J, Khan L. A machine learning approach to Android malware detection. In Proc. European Intelligence and Security Informatics Conf (EISIC), August 2012, pp.141-147. Symantec. 2015 Internet Security Threat Report:Attackers are bigger, bolder, and faster. https://www.symantec.com/connect/blogs/2015-internet-security-threat-report-attackers-are-bigger-bolder-and-faster, Oct. 2017. Zhou Y J, Jiang X X. Dissecting Android malware:Characterization and evolution. In Proc. IEEE Symp. Security and Privacy (SP), May 2012, pp.95-109. Li L, Bartel A, Bissyandé T F, Klein J, Le Traon Y, Arzt S, Rasthofer S, Bodden E, Octeau D, Mcdaniel P. IccTA:Detecting inter-component privacy leaks in Android apps. In Proc. the 37th Int. Conf. Software Engineering, May 2015, pp.280-291 Arp D, Spreitzenbarth M, Hübner M, Gascon H, Rieck K. DREBIN:Effective and explainable detection of Android malware in your pocket. In Proc. Network and Distributed System Security Symp. (NDSS), February 2014. Li L, Li D Y, Bissyande T F, Klein J, Le Traon Y, Lo D, Cavallaro L. Understanding Android app piggybacking:A systematic study of malicious code grafting. IEEE Trans. Information Forensics and Security, 2017, 12(6):1269-1284. Chen K, Wang P, Lee Y, Wang X F, Zhang N, Huang H Q, Zou W, Liu P. Finding unknown malice in 10 seconds:Mass vetting for new threats at the Google-play scale. In Proc. the 24th USENIX Conf. Security Symp., August 2015, pp.659-674. Li L, Li D Y, Bissyandé T F, Klein J, Cai H P, Lo D, Le Traon Y. Automatically locating malicious packages in piggybacked Android apps. In Proc. the 4th IEEE/ACM Int. Conf. Mobile Software Engineering and Systems (MOBILESoft), May 2017, pp.170-174. Zhou W, Zhou Y J, Grace M, Jiang X X, Zou S H. Fast, scalable detection of "piggybacked" mobile applications. In Proc. the 3rd ACM Conf. Data and Application Security and Privacy, February 2013, pp.185-196. Li L, Li D Y, Bissyandé T F D A, Lo D, Klein J, Le Traon Y. Ungrafting malicious code from piggybacked Android apps. Technical Report, University of Luxembourg, 2016. Li L, Gao J, Hurier M, Kong P F, Bissyandé T F, Bartel A, Klein J, Le Traon Y. AndroZoo++:Collecting millions of Android apps and their metadata for the research community. arXiv:1709.05281, 2017. https://arxiv.org/abs/1709.05281, October 2017. Li L, Bissyandé T F, Klein J, Le Traon Y. An investigation into the use of common libraries in Android apps. In Proc. the 23rd IEEE Int. Conf. Software Analysis, Evolution, and Reengineering (SANER), March 2016, pp.403-414. Li L, Martinez J, Ziadi T, Bissyandé T F, Klein J, Le Traon Y. Mining families of Android applications for extractive SPL adoption. In Proc. the 20th Int. Systems and Software Product Line Conf., September 2016, pp.271-275. Allix K, Bissyandé T F, Jérome Q, Klein J, State R, Le Traon Y. Empirical assessment of machine learning-based malware detectors for Android. Empirical Software Engineering, 2016, 21(1):183-211. Li L, Bissyandé T F, Klein J. SimiDroid:Identifying and explaining similarities in Android apps. In Proc. IEEE Trustcom/BigDataSE/ICESS, August 2017, pp.136-143. Watts D J, Strogatz S H. Collective dynamics of ‘smallworld’ networks. Nature, 1998, 393(6684):440-442. Lam P, Bodden E, Lhotak O, Hendren L. The soot framework for Java program analysis:A retrospective. In Proc. Cetus Users and Compiler Infastructure Workshop (CETUS2011), October 2011. Bartel A, Klein J, Le Traon Y, Monperrus M. Dexpler:Converting Android Dalvik bytecode to Jimple for static analysis with Soot. In Proc. ACM SIGPLAN Int. Workshop on State of the Art in Java Program Analysis, June 2012, pp.27-38. Dutot A, Guinand F, Olivier D, Pigné Y. GraphStream:A tool for bridging the gap between complex systems and dynamic graphs. In Proc. ECCS, October 2007. Breiman L. Random forests. Machine Learning, 2001, 45(1):5-32. Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, 39(1):1-38. Fan M, Liu J, Luo X P, Chen K, Chen T Y, Tian Z Z, Zhang X D, Zheng Q H, Liu T. Frequent subgraph based familial classification of Android malware. In Proc. the 27th IEEE Int. Symp. Software Reliability Engineering (ISSRE), October 2016, pp.24-35. Li L, Bissyandé T F, Octeau D, Klein J. DroidRA:Taming reflection to support whole-program analysis of Android apps. In Proc. the 25th Int. Symp. Software Testing and Analysis, July 2016, pp.318-329. Bichsel B, Raychev V, Tsankov P, Vechev M. Statistical deobfuscation of Android applications. In Proc. ACM SIGSAC Conf. Computer and Communications Security, October 2016, pp.343-355. Wang Y, Rountev A. Who changed you? Obfuscator identification for Android. In Proc. the 4th IEEE/ACM Int. Conf. Mobile Software Engineering and Systems, May 2017, pp.154-164. Zhang Y Q, Luo X P, Yin H Y. DexHunter:Toward extracting hidden code from packed Android applications. In Proc. the 20th European Symp. Research in Computer Security, September 2015, pp.293-311. Xue L, Luo X P, Yu L, Wang S, Wu D H. Adaptive unpacking of Android apps. In Proc. the 39th Int. Conf. Software Engineering, May 2017, pp.358-369. Shao Y R, Luo X P, Qian C X, Zhu P F, Zhang L. Towards a scalable resource-driven approach for detecting repackaged Android applications. In Proc. the 30th Annual Computer Security Applications Conf., December 2014, pp.56-65. Gonzalez H, Kadir A A, Stakhanova N, Alzahrani A J, Ghorbani A A. Exploring reverse engineering symptoms in Android apps. In Proc. the 8th European Workshop on System Security, April 2015, Article No. 7. Li M H, Wang W, Wang P, Wang S, Wu D H, Liu J, Xue R, Huo W. LibD:Scalable and precise third-party library detection in Android markets. In Proc. the 39th Int. Conf. Software Engineering, May 2017, pp.335-346. Ma Z A, Wang H Y, Guo Y, Chen X Q. LibRadar:Fast and accurate detection of third-party libraries in Android apps. In Proc. the 38th Int. Conf. Software Engineering Companion, May 2016 pp.653-656. Wang H Y, Guo Y. Understanding third-party libraries in mobile app analysis. In Proc. the 39th IEEE/ACM Int. Conf. Software Engineering Companion, May 2017, pp.515-516. Nagappan M Shihab E. Future trends in software engineering research for mobile apps. In Proc. the 23rd IEEE Int. Conf. Software Analysis, Evolution, and Reengineering (SANER), March 2016, pp.21-32. Kolter J Z Maloof M A. Learning to detect and classify malicious executables in the wild. The Journal of Machine Learning Research, 2006, 7:2721-2744. Zhang B Y, Yin J P, Hao J B, Zhang D X, Wang S L. Malicious codes detection based on ensemble learning. In Proc. the 4th Int. Conf. Autonomic and Trusted Computing, July 2007, pp.468-477. Perdisci R, Lanzi A, Lee W. McBoost:Boosting scalability in malware collection and analysis using statistical classification of executables. In Proc. Annual Computer Security Applications Conf., December 2008, pp.301-310. Cesare S, Xiang Y. Classification of malware using structured control flow. In Proc. the 8th Australasian Symp. Parallel and Distributed Computing, January 2010, pp.61-70. Hu X, Chiueh T C, Shin K G. Large-scale malware indexing using function-call graphs. In Proc. the 16th ACM Conf. Computer and Communications Security, November 2009 pp.611-620. Jang J, Brumley D, Venkataraman S. BitShred:Feature hashing malware for scalable triage and semantic analysis. In Proc. the 18th ACM Conf. Computer and Communications Security, October 2011 pp.309-320. Linares-Vásquez M, Holtzhauer A, Poshyvanyk D. On automatically detecting similar Android apps. In Proc. the 24th IEEE Int. Conf. Program Comprehension (ICPC), May 2016. Zhou W, Zhou Y J, Jiang X X, Ning P. Detecting repackaged smart phone applications in third-party Android marketplaces. In Proc. the 2nd ACM Conf. Data and Application Security and Privacy, February 2012, pp.317-326. Wu D J, Mao CH, Wei TE, Lee HM, Wu KP. DroidMat:Android malware detection through manifest and API calls tracing. In Proc. the 7th Asia Joint Conf. Information Security (AsiaJCIS), August 2012, pp.62-69. Amos B, Turner H, White J. Applying machine learning classifiers to dynamic Android malware detection at scale. In Proc. the 9th Int. Wireless Communications and Mobile Computing Conf. (IWCMC), July 2013, pp.1666-1671. Zhang M, Duan Y, Yin H, Zhao Z R. Semantics-aware Android malware classification using weighted contextual API dependency graphs. In Proc. ACM SIGSAC Conf. Computer and Communications Security, November 2014, pp.1105-1116. Li L, Bissyandé T F, Bartel A, Klein J, Le Traon Y. The multigeneration repackaging hypothesis. In Proc. the 39th IEEE/ACM Int. Conf. Software Engineering, May 2017 pp.344-346. Meng G Z, Xue Y X, Xu Z Z, Liu Y, Zhang J, Narayanan A. Semantic modelling of Android malware for effective malware comprehension, detection, and classification. In Proc. the 25th Int. Symp. Software Testing and Analysis, July 2016, pp.306-317. Tian K, Yao D F, Ryder B G, Tan G. Analysis of code heterogeneity for high-precision classification of repackaged malware. In Proc. IEEE Security and Privacy Workshops (SPW), May 2016 pp.262-271.
Copyright 2010 by Journal of Computer Science and Technology