DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

Xiang Chen; Dun Zhang; Zhan-Qi Cui; Qing Gu; Xiao-Lin Ju

doi:10.1007/s11390-019-1958-0

Xiang Chen, Dun Zhang, Zhan-Qi Cui, Qing Gu, Xiao-Lin Ju. DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy[J]. Journal of Computer Science and Technology, 2019, 34(5): 1020-1038. DOI: 10.1007/s11390-019-1958-0

Citation:

DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

Abstract

Abstract

In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.

FullText(HTML)

References (59)

Relative Articles

Supplements (1)

Cited By

DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

Abstract

Catalog

Export File

Citation

Format

Content