一种基于标签图和遗传算法的活动图融合方法

doi:10.1007/s11390-020-0293-9

一种基于标签图和遗传算法的活动图融合方法

Activity Diagram Synthesis Using Labelled Graphs and the Genetic Algorithm

摘要

摘要: 1、研究背景（context）：
随着软件规模的日益增长和复杂性的不断提升，准确地发现用户对软件产品的需求变得越来越困难。其困难主要体现在以下两个方面：一，软件面向的用户群体呈现多样性，给获取全面的需求带来难度；二，需求利益相关者的数量呈现大规模，给需求的综合带来难度。群体需求工程提出借助众包平台和社交网络等媒介将利益相关者汇聚在一起，共同解决需求工程中需求获取、优先级排序、需求验证和管理等方面的问题。其中，为了解决具体需求任务，不同参与者分布式的提交问题相关的制品信息（如需求描述、需求打分、体验反馈等），这些制品信息需要进一步汇聚和融合，生成该具体任务的反馈结果。
2、目的（Objective）：
目前，制品信息的融合主要依赖人工完成，也有少量研究提出使用文本聚类的方法辅助信息的汇聚和融合。然而，如何有效的自动化的融合分布式人群提供的制品信息，仍然是目前群体需求工程中一个亟待解决的问题。
3、方法（Method）：
图（diagram）是软件需求领域常用的表达方法。使用图表达需求具有便于利益相关沟通、理解和讨论的优点。活动图是一种常用的需求模型，用于表达用户故事、用例、场景和业务流等需求。本文提出一种自动化的多活动图融合方法。该方法以标签图作为多活动图的融合表示，将求解多活动图融合问题转换成优化问题，即：在融合问题解空间中搜索最优的标签图。在最优的标签图中，所有输入图中描述信息相似且结构也相似的节点被合并在一起，而其它所有输入图的信息被保留。本文提出的融合方法包括如下关键技术点：标签图作为需求融合表示，基于广义熵的适用性度量，基于标签图的演化算子。
本文的创新点如下：（1）提出了标签图作为演化过程的候选融合解的表达方式。该表达方法对于输入图具有融合问题表示上的通用性；（2）提出广义熵度量融合候选解的适应性。广义熵来自于信息熵，度量来自不同信息源的信息分歧程度。在广义熵中考虑了信息之间的相似度，相似度越高的信息融合在一起，广义熵越小；（3）提出求解多输入图融合的遗传算法。针对标签图设计交叉和变异算子，在可接受的时间内获取高质量的融合活动图结果。
4、结果（Result&Findings）：
本文针对四个不同规模（不同数量的输入图和节点数）的案例对方法的融合效果、时间性能等方面开展实验研究。结果表明：本文提出方法能够在可接受的时间内获得具有较高准确率和召回率的融合结果（平均具有86.9%和82.1%的准确率和召回率），且方法对不同规模的融合问题具有可扩展性。
5、结论（Conclusions）：
本文提出方法获得的融合结果具有如下三个性质：最小化、信息保持和可追踪。最小化使得本文方法找到了一个足够好的融合方案；信息保持使得输入图的信息不缺失；可追踪确保基于融合结果能追踪回输入图信息。基于融合活动图，将在以下方面为群体需求工程提供支持：需求协商、需求优先级、分析群体参与者的贡献以及激发参与者提供高质量的需求等。
下面是一篇英文长摘要实例：(来自Empirical Software Engineering,https://doi.org/10.1007/s10664-019-09753-2）
Abstract
Context
Modern software systems are deployed in sociotechnical settings,combining social entities (humans and organizations) with technical entities (software and devices).In such settings,on top of technical controls that implement security features of software,regulations specify how users should behave in security-critical situations.No matter how carefully the software is designed and how well regulations are enforced,such systems are subject to breaches due to social (user misuse) and technical (vulnerabilities in software) factors.Breach reports,often legally mandated,describe what went wrong during a breach and how the breach was remedied.However,breach reports are not formally investigated in current practice,leading to valuable lessons being lost regarding past failures.
Objective
Our research aim is to aid security analysts and software developers in obtaining a set of legal,security,and privacy requirements,by developing a crowdsourcing methodology to extract knowledge from regulations and breach reports.
Method
We present ÇORBA,a methodology that leverages human intelligence via crowdsourcing,and extracts requirements from textual artifacts in the form of regulatory norms.We evaluate ÇORBA on the US healthcare regulations from the Health Insurance Portability and Accountability Act (HIPAA) and breach reports published by the US Department of Health and Human Services (HHS).Following this methodology,we have conducted a pilot and a final study on the Amazon Mechanical Turk crowdsourcing platform.
Results
ÇORBA yields high quality responses from crowd workers,which we analyze to identify requirements for the purpose of complementing HIPAA regulations.We publish a curated dataset of the worker responses and identified requirements.
Conclusions
The results show that the instructions and question formats presented to the crowd workers significantly affect the response quality regarding the identification of requirements.We have observed significant improvement from the pilot to the final study by revising the instructions and question formats.Other factors,such as worker types,breach types,or length of reports,do not have notable effect on the workers’ performance.Moreover,we discuss other potential improvements such as breach report restructuring and text highlighting with automated methods.

Abstract: Many applications need to meet diverse requirements of a large-scale distributed user group. That challenges the current requirements engineering techniques. Crowd-based requirements engineering was proposed as an umbrella term for dealing with the requirements development in the context of the large-scale user group. However, there are still many issues. Among others, a key issue is how to merge these requirements to produce the synthesized requirements description when a set of requirements descriptions from different participants are received. Appropriate techniques are needed for supporting the requirements synthesis. Diagrams are widely used in industry to represent requirements. This paper chooses the activity diagrams and proposes a novel approach for the activity diagram synthesis which adopts the genetic algorithm to repeatedly modify a population of individual solutions toward an optimal solution. As a result, it can automatically generate a resulting diagram which combines the commonalities as many as possible while leveraging the variabilities of a set of input diagrams. The approach is featured by: 1) the labelled graph proposed as the representation of the candidate solutions during the iterative evolution; 2) the generalized entropy proposed and defined as the measurement of the solutions; 3) the genetic algorithm designed for sorting out the high-quality solution. Four cases of different scales are used to evaluate the effectiveness of the approach. The experimental results show that not only the approach gets high precision and recall but also the resulting diagram satisfies the properties of minimization and information preservation and can support the requirements traceability.

HTML全文

参考文献()

施引文献

资源附件()