We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Chong Long, Min-Lie Huang, Xiao-Yan Zhu, Ming Li. A New Approach for Multi-Document Update Summarization[J]. Journal of Computer Science and Technology, 2010, 25(4): 739-749. DOI: 10.1007/s11390-010-1057-8
Citation: Chong Long, Min-Lie Huang, Xiao-Yan Zhu, Ming Li. A New Approach for Multi-Document Update Summarization[J]. Journal of Computer Science and Technology, 2010, 25(4): 739-749. DOI: 10.1007/s11390-010-1057-8

A New Approach for Multi-Document Update Summarization

Funds: The work was supported by the National Natural Science Foundation of China under Grant No. 60973104, the National Basic Research 973 Program of China under Grant No. 2007CB311003, and the IRCI Project from IDRC, Canada.
More Information
  • Author Bio:

    Chong Long received his B.E. degree from Tsinghua University, China in 2005. He is a Ph.D. candidate in Department of Computer Science and Technology, Tsinghua University, China. His research interests include Kolmogorov complexity and its applications, text mining and natural language processing.

    Min-Lie Huang now is a faculty member of Dept. Computer Science and Technology, Tsinghua University. He received his Ph.D. degree from Tsinghua University in 2006. His research interests include machine learning, natural language processing, graph-based text mining, opinion and review mining, and complex question answering.

    Xiao-Yan Zhu is a professor and the Deputy Head of State Key Lab of Intelligent Technology and Systems, Tsinghua University. She obtained the Bachelor's degree from University of Science and Technology Beijing in 1982, the Master's degree from Kobe University in 1987, and the Ph.D. degree from Nagoya Institute of Technology, Japan in 1990. She has been teaching at Tsinghua University since 1993. Her research interests include pattern recognition, neural network, machine learning, natural language processing and bioinformatics. She is a member of CCF.

    Ming Li is a Canada Research Chair in Bioinformatics and a University Professor of the University of Waterloo. He is a fellow of Royal Society of Canada, ACM, and IEEE. He is a recipient of E.W.R. Steacie Fellowship Award in 1996, and the 2001 Killam Fellowship. Together with Paul Vitanyi they have pioneered the applications of Kolmogorov complexity and co-authored the book ``An Introduction to Kolmogorov Complexity and Its Applications''. His research interests recently include protein structure determination and next generation Internet search engine.

  • Received Date: October 21, 2009
  • Revised Date: April 07, 2010
  • Published Date: July 08, 2010
  • Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
  • [1]
    Luhn H P. The automatic creation of literature abstracts. IBM Journal of Research and Development, 1958, 2(2): 159-165.
    [2]
    Wan X, Yang J, Xiao J. Manifold-ranking based topic-focused multi-document summarization. In Proc IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2903-2908.
    [3]
    Li M, Vitányi P M. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1997.
    [4]
    Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.
    [5]
    Radev D R, Jing H, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938.
    [6]
    Kupiec J, Pedersen J, Chen F. A trainable document summarizer. In Proc. SIGIR, Seattle, USA, Jul. 9-13, 1995, pp.68-73.
    [7]
    Leskovec J, Milic-Frayling N, Grobelnik M. Impact of linguistic analysis on the semantic graph coverage and learning of document extracts. In Proc. AAAI, Pittsburgh, USA, Jul. 9-13, 2005, pp.1069-1074.
    [8]
    Shen D, Sun J T, Li H, Yang Q, Chen Z. Document summarization using conditional random fields. In Proc. IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2862-2867.
    [9]
    Zhang J, Cheng X, Wu G, Xu H. Adasum: An adaptive model for summarization. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.901-909.
    [10]
    Erkan G, Radev D R. Lexpagerank: Prestige in multi-document text summarization. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.365-371.
    [11]
    Mihalcea R, Tarau P. Textrank --- Bring order into texts. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.119-126.
    [12]
    Mihalcea R, Tarau P. A language independent algorithm for single and multiple document summarization. In Proc. IJCNLP, Jeju Island, Korea, Oct.11-13, 2005, pp.19-24.
    [13]
    Wan X, Yang J, Xiao J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proc. ACL, Prague, Czech Republic, Jun. 23-30, 2007, pp.552-559.
    [14]
    Wan X. An exploration of document impact on graph-based multi-document summarization. In Proc. EMNLP, Hawaii, USA, Oct. 25-27, 2008, pp.755-762.
    [15]
    Bennett C H, Gács P, Li M, Vitányi P M, Zurek W H. Information distance. IEEE Transactions on Information Theory, Jul. 1998, 44(4): 1407-1423.
    [16]
    Li M, Badger J H, Chen X, Kwong S, Kearney P, Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny.Bioinformatics, 2001, 17(2): 149-154.
    [17]
    Li M, Chen X, Li X, Ma B, Vitányi P M. The similarity metric. IEEE Transactions on Information Theory, 2004, 50(12): 3250-3264.
    [18]
    Long C, Zhu X, Li M, Ma B. Information shared by many objects. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.1213-1220.
    [19]
    Benedetto D, Caglioti E, Loreto V. Language trees and zipping. Physical Review Letters, Jan. 2002, 88(4): 048702.
    [20]
    Bennett C H, Li M, Ma B. Chain letters and evolutionary histories. Scientific American, Jun. 2003, 288(6): 76-81.
    [21]
    Cilibrasi R L, Vitányi P M. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, Mar. 2007, 19(3): 370-383.
    [22]
    Zhang X, Hao Y, Zhu X, Li M. Information distance from a question to an answer. In Proc. SIGKDD, San Jose, USA, Aug. 12-15, 2007, pp.874-883.
    [23]
    Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977, 23(3): 337-343.
    [24]
    Lin C Y, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proc. HLT-NAACL, Edmonton, Canada, May 27-June 1, 2003, pp.71-78.
    [25]
    Nenkova A, Passonneau R, Mckeown K. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing, Apr. 2007, 4(2): 1-23.
  • Related Articles

    [1]Fan Bu, Xiao-Yan Zhu, Ming Li. A New Multiword Expression Metric and Its Applications[J]. Journal of Computer Science and Technology, 2011, 26(1): 3-13. DOI: 10.1007/s11390-011-1106-y
    [2]Xian Zhang, Yu Hao, Xiao-Yan Zhu, Ming Li. New Information Distance Measure and Its Application in Question Answering System[J]. Journal of Computer Science and Technology, 2008, 23(4): 557-572.
    [3]YANG Jianwu, CHEN Xiaoou. A Semi-Structured Document Model for Text Mining[J]. Journal of Computer Science and Technology, 2002, 17(5).
    [4]HUANG Liusheng, CHEN Huaping, WANG Xun, CHEN Guoliang. A Fast Algorithm for Mining Association Rules[J]. Journal of Computer Science and Technology, 2000, 15(6): 619-624.
    [5]JIANG Tao, LI Ming, Paul M.B. Average-Case Analysis of Algorithms Using Kolmogorov Complexity[J]. Journal of Computer Science and Technology, 2000, 15(5): 402-408.
    [6]JIANG Tao, LI Ming, Paul M.B.Vitanyi. Average-Case Analysis of Algorithms Using Kolmogorov Complexity[J]. Journal of Computer Science and Technology, 2000, 15(5).
    [7]ZHOU Aoying, JIN Wen, ZHOU Shuigeng, QIAN Weining, TIAN Zenping. Incremental Mining of the Schema of Semistructured Data[J]. Journal of Computer Science and Technology, 2000, 15(3): 241-248.
    [8]Fan Jianhua, Li Deyi. An Overview of Data Mining and Knowledge Discovery[J]. Journal of Computer Science and Technology, 1998, 13(4): 348-368.
    [9]Xu Meirui, Liu Xiaolin. A VLSI Algorithm for Calculating the Tree to Tree Distance[J]. Journal of Computer Science and Technology, 1993, 8(1): 68-76.
    [10]Zhang Yan, He Jichao. Data Dependencies in Database with Incomplete Information[J]. Journal of Computer Science and Technology, 1988, 3(2): 131-138.

Catalog

    Article views (21) PDF downloads (1966) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return