计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (2): 397-418.doi: 10.1007/s11390-020-9754-4

所属专题: Software Systems

• • 上一篇    下一篇

SymPas:符号化程序切片

Ying-Zhou Zhang, Senior Member, CCF, Member, ACM, IEEE   

  1. College of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • 收稿日期:2019-06-05 修回日期:2020-01-22 出版日期:2021-03-05 发布日期:2021-04-01
  • 作者简介:Ying-Zhou Zhang received his Ph.D. degree in computer science and engineering from University of Southeast, Nanjing, in 2005, and his M.S. degree in mathematics from University of Hohai, Nanjing, in 2002. He is currently working as a professor in the School of Computer Science and Technology, Nanjing University of Posts and Communications, Nanjing. He worked with the Hong Kong Polytechnic University, Hong Kong, in 2011, and as a visiting scholar with the University of Cambridge between 2012 and 2013. He has published more than 80 research papers. His research interests include formal methods, software analysis and program slicing, software reliability and security, service computing and functional programming. He is a senior member of CCF, and a member of ACM and IEEE.
  • 基金资助:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 60973046 and 61300054, the Qing Lan Project of Jiangsu Province of China, and the 1311 Talent Program Funding of Nanjing University of Posts and Telecommunications.

SymPas: Symbolic Program Slicing

Ying-Zhou Zhang, Senior Member, CCF, Member, ACM, IEEE        

  1. College of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2019-06-05 Revised:2020-01-22 Online:2021-03-05 Published:2021-04-01
  • About author:Ying-Zhou Zhang received his Ph.D. degree in computer science and engineering from University of Southeast, Nanjing, in 2005, and his M.S. degree in mathematics from University of Hohai, Nanjing, in 2002. He is currently working as a professor in the School of Computer Science and Technology, Nanjing University of Posts and Communications, Nanjing. He worked with the Hong Kong Polytechnic University, Hong Kong, in 2011, and as a visiting scholar with the University of Cambridge between 2012 and 2013. He has published more than 80 research papers. His research interests include formal methods, software analysis and program slicing, software reliability and security, service computing and functional programming. He is a senior member of CCF, and a member of ACM and IEEE.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 60973046 and 61300054, the Qing Lan Project of Jiangsu Province of China, and the 1311 Talent Program Funding of Nanjing University of Posts and Telecommunications.

程序切片是一种重要程序分解技术,可有助于人们更好地理解程序结构以及用途,提高软件开发效率和软件质量。因“上下文敏感计算”在美国商务部出口管制清单中,当前主流的C/C++程序分析/程序切片商业工具CodeSurfer(美国GrammaTech公司核心产品)我们无法使用,且这种经典的基于依赖图(PDG/SDG或超图)程序切片方法也有其固有缺陷,例如:依赖图的构造过程繁杂且易爆炸,而且不能直接回答/查询依赖图中两个节点是否有依赖关系,需要多次遍历整个依赖图(同时每次遍历需要小心选择不同类型的节点和边),所以我们提出了一种基于轻量级上下文敏感依赖分析的符号化程序切片方法SymPas(Symbolic Program Slicing),并基于LLVM实现了SymPas切片器原型工具llvm-slicing。为比较方便,我们利用当前主流依赖图分析方法(包括过程间有限分布子集IFDS方法)首次实现LLVM中间码IR过程间切片器,还扩展实现基于信息流方程的程序切片方法。经在相关基准测试程序集上对比实验,我们SymPas切片方法明显优于当前最快的SDG-IFDS切片方法,其中后向切片的时间成本平均降低了6倍,空间成本平均降低了4倍;同时具有很高的可扩展性;且无需构造易爆炸的SDG图或超图,还可直接回答/查询语句节点间是否存在依赖关系。

关键词: 数据流分析, 指令依赖表, LLVM, 过程符号切片, 程序切片

Abstract: Program slicing is a technique for simplifying programs by focusing on selected aspects of their behavior. Current mainstream static slicing methods operate on dependence graph PDG (program dependence graph) or SDG (system dependence graph), but these friendly graph representations may be a bit expensive for some users. In this paper we attempt to study a light-weight approach of static program slicing, called Symbolic Program Slicing (SymPas), which works as a dataflow analysis on LLVM (low-level virtual machine). In our SymPas approach, slices are stored in symbolic forms, not in procedures being re-analyzed (cf. procedure summaries). Instead of re-analyzing a procedure multiple times to find its slices for each callling context, we calculate a single symbolic slice which can be instantiated at call sites avoiding re-analysis; SymPas is implemented with LLVM to perform slicing on LLVM intermediate representation (IR). For comparison, we systematically adapt IFDS (interprocedural finite distributive subset) analysis and the SDG-based slicing method (SDGIFDS) to statically slice IR programs. Evaluated on open-source and benchmark programs, our backward SymPas shows a factor-of-6 reduction in time cost and a factor-of-4 reduction in space cost, compared with backward SDG-IFDS, thus being more efficient. In addition, the result shows that after studying slices from 66 programs, ranging up to 336 800 IR instructions in size, SymPas is highly size-scalable.

Key words: data flow analysis, instruction dependency table, low-level virtual machine (LLVM), procedure symbolic slice, program slicing

[1] Weiser M. Program slicing. IEEE Transaction on Software Engineering, 1984, 16(5):498-509. DOI:10.1109/TSE.1984.5010248.
[2] Tip F. A survey of program slicing techniques. Journal of Programming Languages, 1995, 3(3):121-189.
[3] Gallagher K B, Lyle J R. Using program slicing in software maintenance. IEEE Transactions on Software Engineering, 1991, 17(8):751-761. DOI:10.1109/32.83912.
[4] Harman M, Hierons R M. An overview of program slicing. Software Focus, 2001, 2(3):85-92. DOI:10.1002/swf.41.
[5] Binkley D, Gallagher K B. Program slicing. Advances in Computers, 1996, 43:1-50. DOI:10.1016/S0065-2458(08)60641-5.
[6] Josep S. A vocabulary of program slicing-based techniques. ACM Computing Surveys, 2012, 44(3):Article No. 12. DOI:10.1145/2187671.2187674.
[7] Binkley D, Harman M. A survey of empirical results on program slicing. Advances in Computers, 2004, 62:105-178. DOI:10.1016/S0065-2458(03)62003-6.
[8] Binkley D, Danicic S, Gyimothy T, Harman M, Kiss A, Ouarbya L. Formalizing executable dynamic and forward slicing. In Proc. the 4th IEEE International Workshop on Source Code Analysis and Manipulation, September 2004, pp.43-52. DOI:10.1109/SCAM.2004.13.
[9] Horwiz S, Reps T, Binkley D. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 1990, 12(1):26-60. DOI:10.1145/77606.77608.
[10] Reps T, Horwitz S, Sagiv M. Precise interprocedural dataflow analysis via graph reachability. In Proc. the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1995, pp.49-61. DOI:10.1145/199448.199462.
[11] Ottenstein K J, Ottenstein L M. The program dependence graph in a software development environment. ACM SIGPLAN Notices, 1984, 19(5):177-184. DOI:10.1145/390011.808263.
[12] Reps T, Sagiv M, Horwitz S. Interprocedural dataflow analysis via graph reachability. Technical Report, University of Copenhagen, 1994. https://research.cs.wisc.edu/wpis/papers/diku-tr94-14.pdf, Nov. 2020.
[13] Naeem A, Lhotak O, Rodriguez J. Practical extensions to the IFDS algorithm. In Proc. the 19th International Conference on Compiler Construction, March 2010, pp.124-144. DOI:10.1007/978-3-642-11970-58.
[14] Cytron R, Ferrante J, Rosen B K, Wegman M N, Zadeck F K. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Prog. Langs. and Systs., 1991, 13(4):451-490. DOI:10.1145/115372.115320.
[15] Zhao J, Zdancewic S, Nagarakatte S, Martin M. Formalizing the LLVM intermediate representation for verified program transformation. In Proc. the 39th Annual ACM SIGPLANSIGACT Symp. Principles of Programming Languages, January 2012, pp.427-440. DOI:10.1145/2103656.2103709.
[16] Canfora G, Cimitile A, De Lucia A. Conditioned program slicing. Information and Software Technology, 1998, 40(11/12):595-607. DOI:10.1016/S0950-5849(98)00086-X.
[17] Harman M, Binkley D, Danicic S. Amorphous program slicing. Journal of Systems and Software, 2003, 68(1):45-64. DOI:10.1016/S0164-1212(02)00135-8.
[18] Morrison D R. PATRICIA-Practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM, 1968, 15(4):514-534. DOI:10.1145/321479.321481.
[19] Okasaki C, Gill A. Fast mergeable integer maps. In Proc. the 1998 ACM SIGPLAN Workshop on Machine Learning, September 1998, pp.77-86.
[20] Adams S. Efficient sets:A balancing act. Journal of Functional Programming, 1993, 3(4):553-562. DOI:10.1017/S0956796800000885.
[21] Sharir M, Pnueli A. Two approaches to interprocedural data flow analysis. In Program Flow Analysis:Theory and Applications, Muchnick S S, Jones N D (eds.), Prentice-Hall Inc., 1981, pp.189-233.
[22] Gustafsson J, Betts A, Ermedahl A, Lisper B. The mälardalen WCET benchmarks-Past, present and future. In Proc. the 10th Workshop Worst-Case Execution Time Analysis, July 2010, pp.137-147. DOI:10.4230/OASIcs.WCET.2010.136.
[23] Binkley D, Gold N, Harman M. An empirical study of static program slice size. ACM Transactions on Software Engineering and Methodology, 2007, 16(2):Article No. 8. DOI:10.1145/1217295.1217297.
[24] Hwang J C, Du M W, Chou C R. Finding program slices for recursive procedures. In Proc. the 12th Annual International Computer Software & Applications Conference, Oct. 1988, pp.220-227. DOI:10.1109/CMPSAC.1988.17176.
[25] Reps T. On the sequential nature of interprocedural program-analysis problems. Acta Informatica, 1996, 33(8):739-757. DOI:10.1007/s002360050068.
[26] Lakhotia A. Improved interprocedural slicing algorithm. Technical Report, University of Southwestern Louisian, 1992. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.3466&rep=rep1&type=pdf, Nov. 2020.
[27] Livadas P, Johnson T. An optimal algorithm for the construction of the system dependence graph. Information Sciences, 2000, 125(1/2/3/4):99-131. DOI:10.1016/S0020-0255(99)00030-4.
[28] Alomari H, Collard M, Maletic J, Alhindawi N, Meqdadi O. srcSlice:Very efficient and scalable forward static slicing. Journal of Software:Evolution and Process, 2014, 26(11):931-961. DOI:10.1002/smr.1651.
[29] Mastroeni I, Zanardini D. Abstract program slicing:An abstract interpretation-based approach to program slicing. ACM Trans. Comput. Logic, 2017, 18(1):Article No. 7. DOI:10.1145/3029052.
[30] Cousot P, Cousot R. Abstract interpretation:A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. the 4th ACM Symp. Principles of Programming Languages, January 1977, pp.238-252. DOI:10.1145/512950.512973.
[31] Zhang Y. A novel formal approach to program slicing. Science in China, Ser. F:Info. Sci., 2007, 50(5):657-670. DOI:10.1007/s11432-007-0061-2.
[32] Zhang Y. A precise monadic dynamic slicing method. Knowledge-Based Systems, 2017, 115:40-54. DOI:10.1016/j.knosys.2016.10.013.
[33] Liang S. Modular monadic semantics and compilation[Ph.D. Thesis]. University of Yale, 1998.
[34] Wadler P. Comprehending monads. In Proc. the 1990 ACM Conf. LISP and Functional Programming, May 1990, pp.61-78. DOI:10.1145/91556.91592.
[35] Liang S, Hudak P, Jones M. Monad transformers and modular interpreters. In Proc. the 22nd ACM SIG-PLANSIGACT Sym. Principles of Programming Languages, January 1995, pp.333-343. DOI:10.1145/199448.199528.
[36] Binkley D, Gold N, Harman M, Islam S, Krinke J, Yoo S. ORBS:Language-independent program slicing. In Proc. the 22nd Int. Symp. Foundations of Software Engineering, November 2014, pp.109-120. DOI:10.1145/2635868.2635893.
[37] Binkley D, Gold N, Islam S, Krinke J, Yoo S. A comparison of tree-and line-oriented observational slicing. Empir. Software Eng., 2019, 24(1):3077-3113. DOI:10.1007/s10664-018-9675-9.
[38] Beszédes A, Faragó C, Szabó Z, Csirik J, Gyimóthy T. Union slices for program maintenance. In Proc. the 2002 IEEE Int. Conference on Software Maintenance, Oct. 2002, pp.12-21. DOI:10.1109/ICSM.2002.1167743.
[39] Lisper B, Masud A, Khanfar H. Static backward demanddriven slicing. In Proc. the 2015 Workshop on Partial Evaluation and Program Manipulation, January 2015, pp.115-126. DOI:10.1145/2678015.2682538.
[40] Srinivasan V, Reps T. An improved algorithm for slicing machine code. In Proc. the 2016 ACM SIGPLAN Int. Conf. Object-Oriented Programming, Systems, Languages, and Applications, October 2016, pp.378-393. DOI:10.1145/2983990.2984003.
[41] Meyers T, Binkley D. An empirical study of slice-based cohesion and coupling metrics. ACM Transactions on Software Methodology, 2007, 17(1):Article No. 2. DOI:10.1145/1314493.1314495.
[42] Zhang X, Gupta R, Zhang Y. Precise dynamic slicing algorithms. In Proc. the 25th Int. Conf. Software Engineering, May 2003, pp.319-329.
[1] Bo Guo, Young-Woo Kwon, Myoungkyu Song. 分解复合更改在软件发展中的代码审查和回归测试选择中的使用[J]. 计算机科学技术学报, 2019, 34(2): 416-436.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] 李万学;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] 王选; 吕之敏; 汤玉海; 向阳;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[5] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] 章萃; 赵沁平; 徐家福;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[10] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: