Journal of Computer Science and Technology  2008, 23(4 ) 602-611  DOI:     ISSN: 1000-9000 CN: CN 11-2296/TP

Current Issue | Archive | Search                                                            [Print]   [Close]
Information and Service
This Article
Supporting info
PDF(762KB)
Reference
Service and feedback
Email this article to a colleague
Add to my bookshell
Add to citation manager
Cite this article
Email Alert
Feedback
View Feedback
Keywords
statistical natural language processing
abbreviation prediction
support vector regression
word clustering
Authors
Xu Sun
Hou-Feng Wang
Bo Wang

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

Xu Sun1, 2, Hou-Feng Wang1, and Bo Wang1

1Institute of Computational Linguistics, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China 2Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0033, Japan

Abstract
In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).
Keywords statistical natural language processing   abbreviation prediction   support vector regression   word clustering  
Received: 2007-05-08 Accepted: 2008-04-02 Online: 2008-07-10 
DOI:
Fund:
Email: sunxu@is.s.u-tokyo.ac.jp; wanghf@pku.edu.cn; bowang@pku.edu.cn
About author(s):

Other similar articles

Copyright 2008 by Journal of Computer Science and Technology