Computation on Sentence Semantic Distance for Novelty Detection
-
Abstract
Novelty detection is to retrieve new information andfilter redundancy from given sentences that are relevant to a specifictopic. In TREC2003, the authors tried an approach to novelty detectionwith semantic distance computation. The motivation is to expand asentence by introducing semantic information. Computation on semanticdistance between sentences incorporates WordNet with statisticalinformation. The novelty detection is treated as a binaryclassification problem: new sentence or not. The feature vector, usedin the vector space model for classification, consists of variousfactors, including the semantic distance from the sentence to the topicand the distance from the sentence to the previous relevant contextoccurring before it. New sentences are then detected with Winnow andsupport vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors andperformance. It is proved that semantic computation is promising innovelty detection. The ratio of new sentence size to relevant size isfurther studied given different relevant document sizes. It isfound that the ratio reduced with a certain speed (about 0.86). Thenanother group of experiments is performed supervised with the ratio.It is demonstrated that the ratio is helpful to improve the noveltydetection performance.
-
-