• Articles • Previous Articles     Next Articles

Computation on Sentence Semantic Distance for Novelty Detection

Hua-Ping Zhang1,2, Jian Sun1, Bing Wang1, and Shuo Bai1   

  1. 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, P.R. China
    2Graduate School of the Chinese Academy of Sciences, Beijing 100039, P.R. China
  • Received:2003-12-30 Revised:2004-06-23 Online:2005-05-10 Published:2005-05-10

Novelty detection is to retrieve new information andfilter redundancy from given sentences that are relevant to a specifictopic. In TREC2003, the authors tried an approach to novelty detectionwith semantic distance computation. The motivation is to expand asentence by introducing semantic information. Computation on semanticdistance between sentences incorporates WordNet with statisticalinformation. The novelty detection is treated as a binaryclassification problem: new sentence or not. The feature vector, usedin the vector space model for classification, consists of variousfactors, including the semantic distance from the sentence to the topicand the distance from the sentence to the previous relevant contextoccurring before it. New sentences are then detected with Winnow andsupport vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors andperformance. It is proved that semantic computation is promising innovelty detection. The ratio of new sentence size to relevant size isfurther studied given different relevant document sizes. It isfound that the ratio reduced with a certain speed (about 0.86). Thenanother group of experiments is performed supervised with the ratio.It is demonstrated that the ratio is helpful to improve the noveltydetection performance.

Key words: Automated knowledge acquisition; public knowledge; private knowledge; problem model; problem solving model; control model;



[1] Ian Soboroff, Donna Harman. Overview of the TREC 2003 Noveltytrack. In Proc. The Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18--21, 2003, p.38.

[2] Zhang M, Song R, Lin C et al. Expansion-based technologies in finding relevant and newinformation: THU TREC2002 novelty track experiments. In Proc. theEleventh Text Retrieval Conference, Gaithersburg, Maryland, November19--22, 2002, p.591.

[3] Christof Monz, Jaap Kamps, Maarten de Rijke. The University ofAmsterdam at TREC2002. In Proc. The Eleventh Text RetrievalConference, Gaithersburg, Maryland, November 19--22, 2002, p.603.

[4] Leah S, James Allen, Magaret E, Alvaro B, Courtey W. Umassat TREC2002: Cross language and novelty tracks. In Proc. theEleventh Text Retrieval Conference, Gaithersburg, Maryland, November19--22, 2002, p.721.

[5] Hong Qi, Jahna O, Dragomir R. The University of Michigan atTREC2002: Question answering and novelty tracks. In Proc. theEleventh Text Retrieval Conference, Gaithersburg, Maryland, November19--22, 2002, p.733.

[6] Srikanth K, Yongmei S et al. UMBC atTREC12. In Proc. The Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November 18--21, 2003, p.699.

[7] Ganesh R, Kedar B, Chirag Shah, Deepa P. Generic textsummarization using Wordnet for novelty and hard. In Proc. theTwelfth Text Retrieval Conference, Gaithersburg, Maryland, November18--21, 2003, p.303.

[8] Ryosuke Ohgaya, Akiyoshi Shimmura, Tomohiro Takagi. MeijiUniversity Web and Novelty Track Experiments at TREC2003. In Proc.the Twelfth Text Retrieval Conference, Gaithersburg, Maryland, November18--21, 2003, p.399.

[9] Jian Sun, Wenfeng Pan, Huaping Zhang. TREC2003 novelty and webtrack at ICT. In Proc. The Twelfth Text Retrieval Conference, Gaithersburg, Maryland, Nov. 18--21, 2003, p.138.

[10] Taoufiq D, Josiane M. TREC novelty track at IRIT--SIG. In Proc. The Twelfth Text Retrieval Conference, Gaithersburg, Maryland,November 18--21, 2003, p.337.

[11] John M, Daniel M, Dianne P. From TREC to DUC to TREC again.In Proc. The Twelfth Text Retrieval Conference, Gaithersburg, Maryland,November 18--21, 2003, p.293.

[12] Ming-Feng Tsai, Wen-Juan Hou, Chun-Yuan Teng et al. Similarity computation in noveltydetection and GeneRIF annotation. In Proc. The Twelfth TextRetrieval Conference, Gaithersburg, Maryland, November 18--21, 2003,p.474.

[13] Qianli Jin, Jun Zhao, Bo Xu. NLPR at TREC2003: Novelty androbust. In Proc. The Twelfth Text Retrieval Conference, Gaithersburg,Maryland, November 18--21, 2003, p.126.

[14]Church K W, Hanks P. Word association norms, mutualinformation, and lexicography. In Proceedings the 27th AnnualMeeting of the Association for Computational Linguistics, ACL27, 1989,pp.76--83.

[15] Grefenstette G. Use of syntactic context to produce termassociation lists for text retrieval. In Proceedings the 15thAnnual International ACM SIGIR Conference on Research and Developmentin Information Retrieval, Copenhagen, Denmark, June 21--24, 1992,pp.89--97.

[16] George A Miller. WordNet 2.0.http://www.cogsci.pri-n-ceton.edu/~wn/, 2003.

[17] Sujian Li, Jian Zhang, Xiong Huang, Shuo Bai. Semanticcomputation in Chinese question-answering system. Journal of ComputerScience and Technology, 2002, 17(6): 1--7.

[18] Qun Liu, Sujian Li. Lexical semantic similarity computationbased on HowNet. Computational Linguistics and Chinese LanguageProcessing, August 2002, 7(2): 59--76.

[19] Jay J Jiang, David W Conrath. Semantic similarity based oncorpus statistics and lexical taxonomy. In Proc. Int.Conf. Research on Computational Linguistics (ROCLING X), 1997.

[20] Joachims T. Making Large-Scale SVM Learning Practical.Advances in Kernel Methods --Support Vector Learning, Schǒlkopf B,Burges C, Smola A (eds.), MIT-Press, 1999.
[1] Cao Cungen; Liu Wei;. A Three-Stage Knowledge Acquisition Method [J]. , 1995, 10(3): 274-280.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Li Jintao; Min Yinghua;. Product-Oriented Test-Pattern Generation for Programmable Logic Arrays[J]. , 1990, 5(2): 164 -174 .
[2] Zheng Fang; Wu Wenhu; Fang Ditang;. Center-Distance Continuous Probability Models and the Distance Measure[J]. , 1998, 13(5): 426 -437 .
[3] Jie Yang and Mohammed Al-Rawi. Illumination Invariant Recognition of Three-Dimensional Texture in Color Images[J]. , 2005, 20(3): 378 -388 .
[4] Peter M. Haverty, Zhi-Ping Weng, and Ulla Hansen. Transcriptional Regulatory Networks Activated by PI3K and ERK Transduced Growth Signals in Human Glioblastoma Cells[J]. , 2005, 20(4): 439 -445 .
[5] Manas Ranjan Kabat, Manoj Kumar Patel, and Chita Ranjan Tripathy. A Heuristic Algorithm for Core Selection in Multicast Routing[J]. , 2011, 26(6): 954 -961 .
[6] Tao Jiang, Rui Hou, Jian-Bo Dong, Lin Chai, Sally A. McKee, Bin Tian, Li-Xin Zhang, Ning-Hui Sun. Adapting Memory Hierarchies for Emerging Datacenter Interconnects[J]. , 2015, 30(1): 97 -109 .
[7] Rong Wang, Yan Zhu, Tung-Shou Chen, Chin-Chen Chang. Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness[J]. Journal of Computer Science and Technology, 2018, 33(6): 1231 -1242 .
[8] Yang Hong, Yang Zheng, Fan Yang, Bin-Yu Zang, Hai-Bing Guan, Hai-Bo Chen. Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory[J]. Journal of Computer Science and Technology, 2019, 34(1): 94 -112 .
[9] Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li. Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View[J]. Journal of Computer Science and Technology, 2021, 36(5): 1002 -1021 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved