基于鲁棒性预测的深度神经网络质量保障

doi:10.1007/s11390-020-0482-6

摘要: 研究背景：
目前，深度神经网络在处理各种复杂的计算任务中都有着非凡的表现，例如自然语言处理，语音识别，图像处理，推荐系统、情感分析等。与此同时，深度神经网络在生产中的软件系统中也得到了越来越广泛的应用，其中包括很多对安全性要求较高的领域，例如自动驾驶，医疗影像识别，恶意软件检测等。然而，与传统软件不同的是，深度神经网络缺乏内在逻辑的可解释性，在软件质量方面几乎没有任何保障。近年来的研究发现，一些微小的扰动便可欺骗深度神经网络使其产生违反人类直觉的错误行为，从而揭示了深度神经网络严重的安全问题。深度神经网络的鲁棒性量化了其对于对抗攻击的抵抗能力，是深度神经网络安全研究的重要部分。现有研究中计算鲁棒性的方法都需要大量计算，无法达到快速高效的要求。
目的：
提出一种轻量级的深度神经网络鲁棒性预测的方法，能够进行在线且及时的鲁棒性预测，并且可识别对抗样本，为深度神经网络提供安全质量保障服务。本文中，我们关注于卷积神经网络。
方法：
通过为每个待测的神经网络模型训练与之匹配的鲁棒性预测器和对抗样本识别器，从而进行鲁棒性预测和对抗样本识别。鲁棒性预测器和对抗样本识别器的网络结构均为全连接的神经网络，输入均为待测神经网络模型的中间层输出，输出分别为关于每个样本的鲁棒性预测值和其为对抗样本的可能性大小。
结果：
我们在多个深度学习数据集上进行了实验，其中鲁棒性预测器得到的结果与基于对抗攻击搜索方法计算得到的鲁棒性数值的皮尔逊相关系数值为0.61-0.95，相对平均绝对值误差为10%-34%，显著优于基于信心值和显著性回归预测的方法。针对于基于优化产生的对抗样本，对抗样本识别器的结果的AUC指标都高于基于核密度和局部固有维度的检测方法。
结论：
我们提出了一种快速并且可靠的鲁棒性预测方法，其轻量级的特性使之可以应用在计算资源受限的平台中，极大提高了鲁棒性预测服务的高效性和适用性，促进深度神经网络质量安全方面研究的发展。基于鲁棒性预测器的框架，我们提出了对抗样本能够有效识别基于优化算法产生的对抗样本。在未来的研究中，我们可以扩展扰动的范围来研究鲁棒性，例如光照变化和图片旋转等；从获取更多的信息来进行鲁棒性的预测，例如神经元的排布方式等；从研究神经网络内部输出信息和鲁棒性的关系中得到解释神经网络的启发等。本文中，我们研究范围关注于卷积神经网络，在未来的研究中可以拓展到其他经典的深度神经网络模型中。

Abstract: The adoption of deep neural network (DNN) model as the integral part of real-world software systems necessitates explicit consideration of their quality-of-service (QoS). It is well-known that DNN models are prone to adversarial attacks, and thus it is vitally important to be aware of how robust a model's prediction is for a given input instance. A fragile prediction, even with high confidence, is not trustworthy in light of the possibility of adversarial attacks. We propose that DNN models should produce a robustness value as an additional QoS indicator, along with the confidence value, for each prediction they make. Existing approaches for robustness computation are based on adversarial searching, which are usually too expensive to be excised in real time. In this paper, we propose to predict, rather than to compute, the robustness measure for each input instance. Specifically, our approach inspects the output of the neurons of the target model and trains another DNN model to predict the robustness. We focus on convolutional neural network (CNN) models in the current research. Experiments show that our approach is accurate, with only 10%–34% additional errors compared with the offline heavy-weight robustness analysis. It also significantly outperforms some alternative methods. We further validate the effectiveness of the approach when it is applied to detect adversarial attacks and out-of-distribution input. Our approach demonstrates a better performance than, or at least is comparable to, the state-of-the-art techniques.

基于鲁棒性预测的深度神经网络质量保障

Predicted Robustness as QoS for Deep Neural Network Models