基于依存驱动成分句法树的中文名词性谓语语义角色标注研究
Semantic Role Labeling of Chinese Nominal Predicates with Dependency-Driven Constituent Parse Tree Structure
-
摘要: 本文探索卷积树核方法进行中文名词性谓词语义角色标注,特别是提出了一种新的句法解析树表示形式,即依存驱动成分句法树。该结构结合了成分句法树和依存树两者的优点,将依存关系加入到成分句法树结构上,具体做法是使用依存关系类型来替代成分句法树中的短语结构类型。这种依存驱动成分句法树不仅保留了依存句法树中的依存关系信息,还保留了成分句法树的层次结构。基于此结构,本文设计了多种方案获取多种必要信息,如名词性谓语与候选论元之间的最短路径、支持动词和候选论元的头词等信息,极大减少了句法树上存在的多余/噪音信息。最后,本文使用了一个卷积树核函数来计算两棵句法树之间的相似性。为了公平比较,本文基于此结构实现了基于特征的语义角色标注。在中文NomBank语料上的实验结果表明,本文采用的树核方法在基于本文提出的句法结构上性能明显好于其他树核函数,与目前最好的基于特征方法的性能相当。这表明了这种新型句法结构的有效性,一方面能够有效地在成分句法树上表示依存信息,另一方面树核方法能够有效地使用该结构。这也说明了使用树核方法进行语义角色标注具有竞争力,可以作为特征方法的有力补充。Abstract: This paper explores a tree kernel based method for semantic role labeling (SRL) of Chinese nominal predicates via a convolution tree kernel. In particular, a new parse tree representation structure, called dependency-driven constituent parse tree (D-CPT), is proposed to combine the advantages of both constituent and dependence parse trees. This is achieved by directly representing various kinds of dependency relations in a CPT-style structure, which employs dependency relation types instead of phrase labels in CPT (Constituent Parse Tree). In this way, D-CPT not only keeps the dependency relationship information in the dependency parse tree (DPT) structure but also retains the basic hierarchical structure of CPT style. Moreover, several schemes are designed to extract various kinds of necessary information, such as the shortest path between the nominal predicate and the argument candidate, the support verb of the nominal predicate and the head argument modified by the argument candidate, from D-CPT. This largely reduces the noisy information inherent in D-CPT. Finally, a convolution tree kernel is employed to compute the similarity between two parse trees. Besides, we also implement a feature-based method based on D-CPT. Evaluation on Chinese NomBank corpus shows that our tree kernel based method on D-CPT performs significantly better than other tree kernel-based ones and achieves comparable performance with the state-of-the-art feature-based ones. This indicates the effectiveness of the novel D-CPT structure in representing various kinds of dependency relations in a CPT-style structure and our tree kernel based method in exploring the novel D-CPT structure. This also illustrates that the kernel-based methods are competitive and they are complementary with the featurebased methods on SRL.