计算机科学技术学报 ›› 2020,Vol. 35 ›› Issue (2): 395-411.doi: 10.1007/s11390-020-9701-4

• • 上一篇    下一篇

MPI-RCDD:一种MPI运行时的通信死锁检测框架

Hong-Mei Wei1, Jian Gao2,*, Peng Qing2, Kang Yu2, Yan-Fei Fang2, Ming-Lu Li1, Senior Member, CCF, IEEE, Member, ACM   

  1. 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2 Jiangnan Institute of Computing Technology, Wuxi 214083, China
  • 收稿日期:2019-05-10 修回日期:2020-01-07 出版日期:2020-03-05 发布日期:2020-03-18
  • 通讯作者: Jian Gao E-mail:gaojian_whu@163.com
  • 作者简介:Hong-Mei Wei received her Master's degree in computer science from Shanghai Jiao Tong University, Shanghai, in 2004. She is currently pursuing her Ph.D. degree in computer science in Shanghai Jiao Tong University, Shanghai. Her research interests include high-performance computing and parallel compilation.
  • 基金资助:
    This work was supported by the National Key Research and Development Program of China under Grant No. 2017YFB0202003.

MPI-RCDD: A Framework for MPI Runtime Communication Deadlock Detection

Hong-Mei Wei1, Jian Gao2,*, Peng Qing2, Kang Yu2, Yan-Fei Fang2, Ming-Lu Li1, Senior Member, CCF, IEEE, Member, ACM        

  1. 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2 Jiangnan Institute of Computing Technology, Wuxi 214083, China
  • Received:2019-05-10 Revised:2020-01-07 Online:2020-03-05 Published:2020-03-18
  • Contact: Jian Gao E-mail:gaojian_whu@163.com
  • About author:Hong-Mei Wei received her Master's degree in computer science from Shanghai Jiao Tong University, Shanghai, in 2004. She is currently pursuing her Ph.D. degree in computer science in Shanghai Jiao Tong University, Shanghai. Her research interests include high-performance computing and parallel compilation.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China under Grant No. 2017YFB0202003.

MPI已经成为高性能计算编程模型的事实标准,但其丰富灵活的接口语义使得程序易于出现通信死锁,严重影响系统的可用性。然而,现有的MPI通信死锁检测工具可扩展性不高,难以适应持续扩大的系统规模。为此,本文提出一种MPI运行时的通信死锁检测框架MPI-RCDD,该框架包含三种主要机制。首先,MPI-RCDD设计实现与死锁检测相适应的消息日志协议,确保死锁检测必需的通信消息不丢失;其次,使用MPI环境提供的消息异步处理线程实现进程之间依赖关系的传递,使得众多进程能够同时参与死锁检测工作,缓解集中式分析的性能瓶颈问题;此外,提出一种基于AND⊕OR模型的死锁分析算法AODA,该算法将基于消息超时和基于依赖关系两种死锁分析方式相结合,能够在超时等待进程之间的依赖关系传递过程中搜索死锁环或结,准确定位引发死锁的进程并且不会产生误报。在Umpire Test Suit等典型MPI通信死锁测试程序上的实验结果验证了MPI-RCDD框架的有效性,同时,在NPB基准测试程序上的实验获得了令人满意的性能开销,表明本文提出的MPI-RCDD框架具备较强的可扩展性。

关键词: 高性能计算, 消息传递接口(MPI), 通信死锁, 死锁检测, AND⊕OR模型

Abstract: The message passing interface (MPI) has become a de facto standard for programming models of highperformance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPIRCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.

Key words: high-performance computing, message passing interface (MPI), communication deadlock, deadlock detection, AND⊕OR model

[1] Vakkalanka S. Efficient dynamic verification algorithms for MPI applications[Ph.D. Thesis]. School of Computing, The University of Utah, 2010.
[2] Luecke G R, Zou Y, Coyle J et al. Deadlock detection in MPI programs. Concurrency and Computation:Practice and Experience, 2002, 14(11):911-932.
[3] Krammer B, Bidmon K, Müller M S et al. MARMOT:An MPI analysis and checking tool. Advances in Parallel Computing, 2004, 13:493-500.
[4] Vetter J S, de Supinski B R. Dynamic software testing of MPI applications with Umpire. In Proc. the 2000 ACM/IEEE Conference on Supercomputing, November 2000, Article No. 51.
[5] Hilbrich T, Schulz M, de Supinski B R et al. MUST:A scalable approach to runtime error detection in MPI programs. In Proc. the 3rd International Workshop on Parallel Tools for High Performance Computing, September 2000, pp.53-66.
[6] Hilbrich T, Protze J, Schulz M et al. MPI runtime error detection with MUST:Advances in deadlock detection. Scientific Programming, 2013, 21(3/4):109-121.
[7] Do-Mai A T, Diep T D, Thoai N. Race condition and deadlock detection for large-scale applications. In Proc. the 15th International Symposium on Parallel and Distributed Computing, July 2016, pp.319-326.
[8] Forejt V, Joshi S, Kroening D et al. Precise predictive analysis for discovering communication deadlocks in MPI programs. ACM Transactions on Programming Languages and Systems, 2017, 39(4):Article No. 15.
[9] Alnemari R A, Fadel M A, Eassa F. Integrating static and dynamic analysis techniques for detecting dynamic errors in MPI programs. International Journal of Computer Science and Mobile Computing, 2018, 7(4):141-147.
[10] Alghamdi A M, Eassa F E. Software testing techniques for parallel systems:A survey. International Journal of Computer Science and Network Security, 2019, 19(4):176-186.
[11] Hilbrich T, de Supinski B R, Schulz M et al. A graph based approach for MPI deadlock detection. In Proc. the 23rd International Conference on Supercomputing, June 2009, pp.296-305.
[12] Siegel S F, Zirkel T K. FEVS:A functional equivalence verification suite for high-performance scientific computing. Mathematics in Computer Science, 2011, 5(4):427-435.
[13] Müller M, de Supinski B, Gopalakrishnan G et al. Dealing with MPI bugs at scale:Best practices, automatic detection, debugging, and formal verification. http://www.cs.utah.edu/fv/publications/sc11_with_handson.pptx,October 2019.
[14] Bailey D H, Barszcz E, Barton J T et al. The NAS parallel benchmarks. The International Journal of Supercomputing Applications, 1991, 5(3):63-73.
[1] André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip Carns, Toni Cortes, Scott A. Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, Marc-André Vef. 高性能计算专用文件系统[J]. 计算机科学技术学报, 2020, 35(1): 4-26.
[2] Yu-Tong Lu, Peng Cheng, Zhi-Guang Chen. Tianhe-2数据存储与管理系统设计与实现[J]. 计算机科学技术学报, 2020, 35(1): 27-46.
[3] Qi Chen, Kang Chen, Zuo-Ning Chen, Wei Xue, Xu Ji, Bin Yang. 神威存储系统面向应用I/O性能提升的优化介绍[J]. 计算机科学技术学报, 2020, 35(1): 47-60.
[4] Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, André Brinkmann. GekkoFS—一种用于高性能计算应用的临时突发缓冲文件系统[J]. 计算机科学技术学报, 2020, 35(1): 72-91.
[5] Robert B. Ross, George Amvrosiadis, Philip Carns, Charles D. Cranor, Matthieu Dorier, Kevin Harms, Greg Ganger, Garth Gibson, Samuel K. Gutierrez, Robert Latham, Bob Robey, Dana Robinson, Bradley Settlemyer, Galen Shipman, Shane Snyder, Jerome Soumagne, Qing Zheng. Mochi:为高性能计算环境组合数据服务[J]. 计算机科学技术学报, 2020, 35(1): 121-144.
[6] Xu Tan, Xiao-Wei Shen, Xiao-Chun Ye, Da Wang, Dong-Rui Fan, Lunkai Zhang, Wen-Mi. 一种面向数据流架构的无停顿双缓冲机制[J]. , 2018, 33(1): 145-157.
[7] Xiao-Wei Shen, Xiao-Chun Ye, Xu Tan, Da Wang, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Dong-Rui Fan, Ning-Hui Sun. 一种面向数据流架构的高效片上路由结构[J]. , 2017, 32(1): 11-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周笛;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] 陈世华;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] C.Y.Chung; 华宣仁;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[4] 高庆狮; 张祥; 杨树范; 陈树清;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[5] 王建潮; 魏道政;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[6] 陈肇雄; 高庆狮;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[7] 黄河燕;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[8] 郑国梁; 李辉;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[9] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[10] 黄学东; 蔡莲红; 方棣棠; 迟边进; 周立; 蒋力;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: