基于数据流跟踪的最小上下文切换数据竞争检测技术

郑龙; 李洋; 辛杰; 刘海峰; 郑然; 廖小飞; 金海

doi:10.1007/s11390-023-1569-7

摘要:

研究背景 数据竞争指多个线程同时访问了同一共享变量，且其中之一是写操作。多线程技术的引入，数据竞争已然成为了一个影响程序稳定性与健壮性的重要因素。数据竞争检测是一类识别、诊断、重现甚至修复多线程程序中数据竞争问题的程序分析技术。锁同步机制的引入能有效地保护共享访存行为，但同时也使得相当一部分的数据竞争具有更强的隐蔽性，给数据竞争的检测与分析带来了极大的挑战。

目的现有的基于约束求解技术大幅提高了动态程序分析数据检测的覆盖率，极大改善了数据检测的健壮性。但是，其同时带来了大量误报率。更重要是的，对于每个检测的数据竞争，其推荐的数据竞争调度包含大量上下文切换，使得数据竞争的分析和诊断变得异常困难。如何在高效检测数据竞争的同时有效需求最小上下文线程切换调度是提高数据竞争检测和分析效率的重要技术手段。

方法提出了一种基于数据流图跟踪的约束求解新方法（DFTracker），其可以保证不漏掉真实数据竞争的条件下对没有检测的数据竞争推荐最小化线程上下文切换调度。具体而言，受限通过分析和跟踪数据流图减少现有约束求解方法的误报率，进而大量减少因检测误报数据竞争的不必要分析。基于此，进一步提出了一个新型数据竞争调度推荐算法，以保证线程上下文的最小化性质。

结果在大量真实多线程基准程序上对DFTracker进行了测试。相比典型的约束求解方法，DFTracker没有减少真实的数据竞争数据量，大幅减少68%数据竞争误报率；与此同时，推荐的数据竞争调度平均只有4.7次线程间上下文切换，这一结果相比当前最好结果上下文切换次数下降了81.6%。

结论数据竞争的诊断和分析是提升多线程程序健壮性的重要技术。现有约束求解技术通过牺牲误报率的方式提升了数据竞争的覆盖率，同时其推荐的数据竞争调度上下文切换次数过高，提高了数据竞争诊断和分析的复杂度。通过建立多线程程序的数据流图，利用数据流图的内存访问的依赖关系，提出基于数据流跟踪的约束求解方法有效抑制了误报率，显著减少了线程上下文切换次数。提出的技术可以有助于程序员对数据竞争的快速理解，提升程序的调试效率，缩短应用的部署周期。

Abstract: Data race is one of the most important concurrent anomalies in multi-threaded programs. Emerging constraint-based techniques are leveraged into race detection, which is able to find all the races that can be found by any other sound race detector. However, this constraint-based approach has serious limitations on helping programmers analyze and understand data races. First, it may report a large number of false positives due to the unrecognized dataflow propagation of the program. Second, it recommends a wide range of thread context switches to schedule the reported race (including the false one) whenever this race is exposed during the constraint-solving process. This ad hoc recommendation imposes too many context switches, which complicates the data race analysis. To address these two limitations in the state-of-the-art constraint-based race detection, this paper proposes DFTracker, an improved constraint-based race detector to recommend each data race with minimal thread context switches. Specifically, we reduce the false positives by analyzing and tracking the dataflow in the program. By this means, DFTracker thus reduces the unnecessary analysis of false race schedules. We further propose a novel algorithm to recommend an effective race schedule with minimal thread context switches for each data race. Our experimental results on the real applications demonstrate that 1) without removing any true data race, DFTracker effectively prunes false positives by 68% in comparison with the state-of-the-art constraint-based race detector; 2) DFTracker recommends as low as 2.6–8.3 (4.7 on average) thread context switches per data race in the real world, which is 81.6% fewer context switches per data race than the state-of-the-art constraint based race detector. Therefore, DFTracker can be used as an effective tool to understand the data race for programmers.

基于数据流跟踪的最小上下文切换数据竞争检测技术

Minimal Context-Switching Data Race Detection with Dataflow Tracking