Debugging Concurrent Software: Advances and Challenges
Jeff Huang1 and Charles Zhang2, Member, ACM, IEEE
1 Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, U.S.A.;
2 Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Abstract Concurrency debugging is an extremely important yet challenging problem that has been hampering developer productivity and software reliability in the multicore era. We have worked on this problem in the past eight years and have developed several effective methods and automated tools for helping developers debugging shared memory concurrent programs. This article discusses challenges in concurrency debugging and summarizes our research contributions in four important directions: concurrency bug reproduction, detection, understanding, and fixing. It also discusses other recent advances in tackling these challenges.
This work is supported by the United States NSF CAREER Award under Grant No. CCF-1552935 and a Google Faculty Research Award to Jeff Huang, and the Hong Kong SAR RGC/GRF under Grant Nos. 622909 and 621912 to Charles Zhang.
About author: Jeff Huang received his Ph.D. degree in computer science from Hong Kong University of Science and Technology, Hong Kong, and is currently an assistant professor in the Department of Computer Science and Engineering at Texas A&M University, College Station. His research focuses on developing techniques and tools for improving software performance and reliability based on fundamental program analyses and programming language theory. He has published at premium conferences and journals such as PLDI, OOPSLA, ICSE, FSE, IEEE TSE and ACM TOSEM. His research has won awards including ACM SIGSOFT Outstanding Dissertation Award, SIGPLAN PLDI Distinguished Paper Award, SIGPLAN Research Highlights, Google Faculty Research Award, and NSF CAREER Award.
Cite this article:
Jeff Huang, Charles Zhang.Debugging Concurrent Software: Advances and Challenges[J] Journal of Computer Science and Technology, 2016,V31(5): 861-868
 Britton T, Jeng L, Carver G et al. Reversible debugging software. Technical Report, Judge Business School, University of Cambridge, 2013. Guo P, Zimmermann T, Nagappan N et al. Characterizing and predicting which bugs get fixed: An empirical study of Microsoft windows. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.495-504. Gray J. Why do computers stop and what can be done about it? In Proc. the 5th Symp. Reliability in Distributed Software and Database Systems, Jan. 1986, pp.3-12. Lamport L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 1979, 28(9): 690-691. Weaver D, Germond T (eds.). The SPARC Architecture Manual, Version 9. SPARC International, Inc., 1994. Lu S, Park S, Seo E et al. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proc. the 13th ASPLO, Mar. 2008, pp.329-339. Yin Z, Yuan D, Zhou Y et al. How do fixes become bugs? In Proc. the 19th ACM SIGSOFT Symp. the Foundations of Software Engineering and the 13th European Software Engineering Conference, Sept. 2011, pp.26-36. Huang J, Liu P, Zhang C. LEAP: Lightweight deterministic multi-processor replay of concurrent Java programs. In Proc. the 18th ACM SIGSOFT FSE, Nov. 2010, pp.207-216. Huang J, Liu P, Zhang C. LEAP: Lightweight deterministic multi-processor replay of concurrent Java programs. In Proc. the 18th ACM SIGSOFT FSE, Nov. 2010, pp.385-386. Huang J, Zhang C, Dolby J. CLAP: Recording local executions to reproduce concurrency failures. In Proc. ACM PLDI, June 2013, pp.141-152. Ball T, Larus J R. Efficient path profiling. In Proc. the 29th IEEE/ACM MICRO, Dec. 1996, pp.46-57. de Moura L M, Bjørner N. Z3: An efficient SMT solver. In Proc. the 14th TACAS, March 29-April 6, 2008, pp.337-340. Vaswani K, Thazhuthaveetil M J, Srikant Y N. A programmable hardware path profiler. In Proc. the 3rd IEEE/ACM CGO, Mar. 2005, pp.217-228. Huang J, Zhang C. PECAN: Persuasive prediction of concurrency access anomalies. In Proc. the 20th ISSTA, July 2011, pp.144-154. Huang J, Zhou J, Zhang C. Scaling predictive analysis of concurrent programs by removing trace redundancy. ACM Trans. Softw. Eng. Methodol., 2013, 22(1): Article No. 8. Huang J, Meredith P O, Rosu G. Maximal sound predictive race detection with control flow abstraction. In Proc. ACM PLDI, June 2014. Huang J, Luo Q, Rosu G. GPredict: Generic predictive concurrency analysis. In Proc. the 37th ICSE, May 2015, pp.847-857. Huang J, Zhang C. An efficient static trace simplification technique for debugging concurrent programs. In Proc. the 18th SAS, Sept. 2011, pp.163-179. Huang J, Zhang C. LEAN: Simplifying concurrency bug reproduction via replay-supported execution reduction. In Proc. the 27th OOPSLA, Oct. 2012, pp.451-466. Jin G, Song L, Zhang W et al. Automated atomicityviolation fixing. In Proc. PLDI, June 2011, pp.389-400. Jin G, Zhang W, Deng D. Automated concurrency-bug fixing. In Proc. the 10th OSDI, Oct. 2012, pp.221-236. Liu P, Zhang C. Axis: Automatically fixing atomicity violations through solving control constraints. In Proc. the 34th ICSE, June 2012, pp.299-309. Huang J, Zhang C. Execution privatization for scheduleroblivious concurrent programs. In Proc. OOPSLA, Oct. 2012, pp.737-752. Shi Q, Huang J, Chen Z et al. Verifying synchronization for atomicity violations. IEEE Trans. Software Eng., 2016, 42(3): 280-296. Godefroid P. Software model checking: The VeriSoft approach. Formal Methods in System Design, 2005, 26(2): 77-101. Musuvathi M, Qadeer S, Ball T et al. Finding and reproducing Heisenbugs in concurrent programs. In Proc. the 8th USENIX Symposium on Operating Systems Design and Implementation, Dec. 2008, pp.267-280. Flanagan C, Godefroid P. Dynamic partial-order reduction for model checking software. In Proc. the 32nd ACM POPL, Jan. 2005, pp.110-121. Devietti J, Lucia B, Ceze L et al. DMP: Deterministic shared memory multi-processing. In Proc. the 14th ASPLO, Mar. 2009, pp.85-96. Liu T, Curtsinger C, Berger E D. Dthreads: Efficient deterministic multithreading. In Proc. the 33rd ACM SOSP, Oct. 2011, pp.327-336. Cui H, Simsa J, Lin Y H et al. Parrot: A practical runtime for deterministic, stable, and reliable threads. In Proc. the 24th ACM SOSP, Nov. 2013, pp.388-405.