Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression
-
Abstract
In Chinese, phrases and named entities play a centralrole in information retrieval. Abbreviations, however, makekeyword-based approaches less effective. This paper presents anempirical learning approach to Chinese abbreviation prediction. In thisstudy, each abbreviation is taken as a reduced form of thecorresponding definition (expanded form), and the abbreviationprediction is formalized as a scoring and ranking problem amongabbreviation candidates, which are automatically generated from thecorresponding definition. By employing Support Vector Regression (SVR)for scoring, we can obtain multiple abbreviation candidates togetherwith their SVR values, which are used for candidate ranking.Experimental results show that the SVR method performs better than thepopular heuristic rule of abbreviation prediction. In addition, inabbreviation prediction, the SVR method outperforms the hidden Markovmodel (HMM).
-
-