We use cookies to improve your experience with our site.

基于监督学习的定义搜索方法

A Supervised Learning Approach to Search of Definitions

  • 摘要: 本文研究的问题是对定义进行排序,即给定一个术语,我们从文档中搜索到相关的定义片断并且按定义的好坏程度对其进行排序。这和传统的处理方法有很大的区别,传统的方法要么只是生成单一的答案,要么输出所有找到的定义。在本文的研究工作中,首先,用规则的方法把存在于文本中的定义抽取出来,然后,用基于监督学习的排序模型按照定义的好坏程度进行排序。本文研究的问题的核心在于对定义的排序。本文提出了判断定义好坏的规范,把定义按照好坏分成了三个级别:好的定义,一般的定义,和差的定义;同时,本文也提出了新的排序定义的方法,即把问题形式化成为分类问题或者顺序回归问题。本文分别采用了支持向量机(SVM)或者排序支持向量机(Ranking SVM)作为分类或者顺序回归模型并且探索了进行定义排序时所使用的特征定义以及它们对排序结果的影响。实验结果表明用SVM或者Ranking SVM对定义进行排序的效果要显著优于所使用的对照方法,我们采用的对照方法是使用启发式规则方法、传统的信息检索方法、以及SVM回归模型;实验结果还表明我们所使用的方法对于段落级别的定义和句子级别的定义都能够取得良好的效果,并且在一个领域训练出来的SVM或者Ranking SVM模型能够用于其他不同的领域,这表明我们能够训练出通用的排序模型。

     

    Abstract: This paper addresses the issue of search of definitions.Specifically, for a given term, we are to find out its definitioncandidates and rank the candidates according to their likelihood of beinggood definitions. This is in contrast to the traditional methods ofeither generating a single combined definition or outputting allretrieved definitions. Definition ranking is essential for tasks. Aspecification for judging the goodness of a definition is given. In thespecification, a definition is categorized into one of the threelevels: good definition, indifferent definition, or baddefinition. Methods of performing definition ranking are alsoproposed in this paper, which formalize the problem as eitherclassification or ordinal regression. We employ SVM (Support VectorMachines) as the classification model and Ranking SVM as the ordinalregression model respectively, and thus they rank definitioncandidates according to their likelihood of being good definitions.Features for constructing the SVM and Ranking SVM models are defined,which represent the characteristics of terms, definition candidate, andtheir relationship. Experimental results indicate that the use of SVMand Ranking SVM can significantly outperform the baseline methods suchas heuristic rules, the conventional information retrieval---Okapi, orSVM regression. This is true when both the answers are paragraphs andthey are sentences. Experimental results also show that SVM orRanking SVM models trained in one domain can be adapted to anotherdomain, indicating that generic models for definition ranking can beconstructed.

     

/

返回文章
返回