基于知识图谱非对称映射的单词嵌入方法
Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph
-
摘要: 单词嵌入是一种将自然单词映射到低维稠密向量空间的方法。单词嵌入已经在很多自然语言处理任务中显示出了很好的效果。但是, 现有的嵌入技术受限于自然文本中的信息错误和丢失问题。为此, 有很多工作致力于利用知识图谱作为额外信息源来提升单词嵌入向量的质量。尽管这些工作取得了一定成功, 它们仍然忽略了知识图谱的一些重要特征:1)知识图谱中的很多关系不是一对一映射, 而是多对一, 一对多甚至多对多的;2)知识图谱中绝大多数的头实体和尾实体来自于不同的语义空间。为了解决这些问题, 在本论文中, 我们提出了ProjectNet算法。ProjectNet首先通过不同的低秩映射矩阵来转换知识图谱中的头实体和尾实体, 之后再对关系进行建模。低秩矩阵可以建模头尾实体之间的非一对一关系, 而头尾实体不同的映射矩阵可以将它们放置于不同的语义空间中。实验结果显示与之前的工作相比, ProjectNet能够产生更准确的单词嵌入向量, 从而显著提升了各项自然语言处理任务的性能。Abstract: Word embedding, which refers to low-dimensional dense vector representations of natural words, has demonstrated its power in many natural language processing tasks. However, it may suffer from the inaccurate and incomplete information contained in the free text corpus as training data. To tackle this challenge, there have been quite a few studies that leverage knowledge graphs as an additional information source to improve the quality of word embedding. Although these studies have achieved certain success, they have neglected some important facts about knowledge graphs: 1) many relationships in knowledge graphs are many-to-one, one-to-many or even many-to-many, rather than simply one-to-one; 2) most head entities and tail entities in knowledge graphs come from very different semantic spaces. To address these issues, in this paper, we propose a new algorithm named ProjectNet. ProjectNet models the relationships between head and tail entities after transforming them with different low-rank projection matrices. The low-rank projection can allow non oneto-one relationships between entities, while different projection matrices for head and tail entities allow them to originate in different semantic spaces. The experimental results demonstrate that ProjectNet yields more accurate word embedding than previous studies, and thus leads to clear improvements in various natural language processing tasks.