Enhancing N-Gram Based Metrics with Semantics for Better Evaluation of Abstractive Text Summarization
-
Abstract
Text summarization is an important task in natural language processing and it has been applied in many applications. Recently, abstractive summarization has attracted many attentions. However, the traditional evaluation metrics that consider little semantic information, are unsuitable for evaluating the quality of deep learning based abstractive summarization models, since these models may generate new words that do not exist in the original text. Moreover, the out-of-vocabulary (OOV) problem that affects the evaluation results, has not been well solved yet. To address these issues, we propose a novel model called ENMS, to enhance existing N-gram based evaluation metrics with semantics. To be specific, we present two types of methods: N-gram based Semantic Matching (NSM for short), and N-gram based Semantic Similarity (NSS for short), to improve several widely-used evaluation metrics including ROUGE (Recall-Oriented Understudy for Gisting Evaluation), BLEU (Bilingual Evaluation Understudy), etc. NSM and NSS work in different ways. The former calculates the matching degree directly, while the latter mainly improves the similarity measurement. Moreover we propose an N-gram representation mechanism to explore the vector representation of N-grams (including skip-grams). It serves as the basis of our ENMS model, in which we exploit some simple but effective integration methods to solve the OOV problem efficiently. Experimental results over the TAC AESOP dataset show that the metrics improved by our methods are well correlated with human judgements and can be used to better evaluate abstractive summarization methods.
-
-