基于覆盖的Haskell程序错误定位问题研究

李丰; 王国庆; 王萌; 郝丹

doi:10.1007/s11390-024-2967-1

摘要:

研究背景 错误定位（fault localization）是指识别有缺陷的程序元素。在此前大量文献中，基于覆盖的错误定位，特别是基于频谱的错误定位（Spectrum-based fault localization，简称SBFL）由于其有效性和轻量级而受到了广泛的研究。尽管已有丰富的文献，但几乎所有现有的错误定位方法和研究都是针对诸如Java和C之类的命令式编程语言进行的，这使得不同编程范式间存在错误定位的差距。

目的本文旨在研究函数式编程范式下的错误定位方法，并以Haskell编程语言作为代表。

方法我们建立了第一个包括真实和人工注入错误的Haskell项目数据集，这是Haskell语言上首个包含了大量错误信息的数据集。借助这个数据集，我们探索了Haskell的错误定位技术。特别地，类似于命令式编程语言中的SBFL方法，我们研究了程序覆盖的收集方法和可疑度分数计算公式，并考虑到了Haskell语言的特性，调整了方法的各组成部分，得出了一系列适应性方法。此外，我们还设计了基于学习和基于迁移学习的方法，以利用命令式语言的数据，并在我们的数据集上进行评估。

结果&结论 本文辨认了函数式程序设计语言上的错误定位问题，并建立了Haskell上的首个错误数据集，在数据集上尝试了一系列适应性方法和基于迁移学习的方法。实验结果表明，一系列基于现有SBFL技术的适应性方法在Haskell数据集上的表现有限，但基于迁移学习的方法可以有效提高错误定位效果。该研究方向具有广阔前景。

Abstract: Fault localization is to identify faulty program elements. Among a large number of fault localization approaches in the literature, coverage-based fault localization, especially spectrum-based fault localization (SBFL), has been intensively studied due to its effectiveness and lightweightness. Despite the rich literature, almost all existing fault localization approaches and studies have been conducted on imperative programming languages such as Java and C, leaving a gap in other programming paradigms. In this paper, we aim to study fault localization approaches for the functional programming paradigm, using the Haskell language as a representative. To the best of our knowledge, we build up the first dataset on real Haskell projects, including both real and seeded faults. The dataset enables the research of fault localization for functional languages. With it, we explore fault localization techniques for Haskell. In particular, as is typical for SBFL approaches, we study methods for coverage collection and formulae for suspiciousness score computation, and carefully adapt these two components to Haskell considering the language features and characteristics, resulting in a series of adaption approaches. Moreover, we design a learning-based approach and a transfer learning based approach to take advantage of data from imperative languages. Both approaches are evaluated on our dataset to demonstrate the promises of the direction.

基于覆盖的Haskell程序错误定位问题研究

Coverage-Based Fault Localization in Haskell