Processing math: 100%
We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Zhang ZB, Zhong ZM, Yuan PP et al. Improving entity linking in Chinese domain by sense embedding based on graph clustering. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(1): 196−210 Jan. 2023. DOI: 10.1007/s11390-023-2835-4.
Citation: Zhang ZB, Zhong ZM, Yuan PP et al. Improving entity linking in Chinese domain by sense embedding based on graph clustering. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(1): 196−210 Jan. 2023. DOI: 10.1007/s11390-023-2835-4.

Improving Entity Linking in Chinese Domain by Sense Embedding Based on Graph Clustering

Funds: The work was supported by the National Natural Science Foundation of China under Grant Nos. 61932004 and 62072205.
More Information
  • Author Bio:

    Zhao-Bo Zhang is currently pursuing his Ph.D. degree in computer science from Huazhong University of Science and Technology, Wuhan. His research interests include Chinese natural language processing and knowledge representation

    Zhi-Man Zhong received her M.S. degree in computer science from Huazhong University of Science and Technology, Wuhan. Her research focuses on question answering and word embedding

    Ping-Peng Yuan is a professor in the School of Computer Science and Technology at Huazhong University of Science and Technology, Wuhan. He received his Ph.D. degree in computer science from Zhejiang University, Hangzhou. His research interests include databases, knowledge representation and reasoning and natural language processing, with a focus on high-performance computing. During exploring his research, he implements systems and innovative applications in addition to investigating theoretical solutions and algorithmic design. Thus, he is the principle developer of multiple system prototypes, including TripleBit, PathGraph and SemreX

    Hai Jin is a chair professor of computer science and engineering at Huazhong University of Science and Technology, Wuhan. Jin received his Ph.D. degree in computer engineering from Huazhong University of Science and Technology, Wuhan, in 1994. In 1996, he was awarded a German Academic Exchange Service Fellowship to visit the Technical University of Chemnitz, Stra\begin{document}$  \text{β} $\end{document}e der Nationen. Jin worked at The University of Hong Kong, Hong Kong, between 1998 and 2000, and as a visiting scholar at the University of Southern California, Los Angeles, between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. Jin is a CCF Fellow, IEEE Fellow, and a life member of ACM. He has co-authored 22 books and published over 900 research papers. His research interests include computer architecture, virtualization technology, distributed computing, big data processing, network storage, and network security

  • Corresponding author:

    hjin@hust.edu.cn

  • Received Date: September 15, 2022
  • Accepted Date: January 09, 2023
  • Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking. It is of great significance to some NLP (natural language processing) tasks, such as question answering. Unlike English entity linking, Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words, which is more evident in certain scenarios. In Chinese domains, such as industry, the generated candidate entities are usually composed of long strings and are heavily nested. In addition, the meanings of the words that make up industrial entities are sometimes ambiguous. Their semantic space is a subspace of the general word embedding space, and thus each entity word needs to get its exact meanings. Therefore, we propose two schemes to achieve better Chinese entity linking. First, we implement an n-gram based candidate entity generation method to increase the recall rate and reduce the nesting noise. Then, we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding. Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain, we design a sense embedding model based on graph clustering, which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context. We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios. We confirm that our method can better learn candidate entities’ fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.

  • [1]
    Sun C C, Shen D R. Mixed hierarchical networks for deep entity matching. Journal of Computer Science and Technology, 2021, 36(4): 822–838. DOI: 10.1007/s11390-021-1321-0.
    [2]
    Li B Z, Min S, Iyer S, Mehdad Y, Yin W T. Efficient one-pass end-to-end entity linking for questions. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp.6433–6441. DOI: 10.18653/v1/2020.emnlp-main.522.
    [3]
    Chen K, Shen G H, Huang Z Q, Wang H J. Improved entity linking for simple question answering over knowledge graph. International Journal of Software Engineering and Knowledge Engineering, 2021, 31(1): 55–80. DOI: 10.1142/S0218194021400039.
    [4]
    Amplayo R K, Lim S, Hwang S W. Entity commonsense representation for neural abstractive summarization. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2018, pp.697–707. DOI: 10.18653/v1/N18-1064.
    [5]
    Shen W, Wang J Y, Han J W. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowledge and Data Engineering, 2015, 27(2): 443–460. DOI: 10.1109/TKDE.2014.2327028.
    [6]
    Li M Y, Xing Y Q, Kong F, Zhou G D. Towards better entity linking. Frontiers of Computer Science, 2022, 16(2): 162308. DOI: 10.1007/s11704-020-0192-9.
    [7]
    Fu J L, Qiu J, Guo Y L, Li L. Entity linking and name disambiguation using SVM in Chinese micro-blogs. In Proc. the 11th International Conference on Natural Computation, Aug. 2015, pp.468–472. DOI: 10.1109/ICNC.2015.7378034.
    [8]
    Huang D C, Wang J L. An approach on Chinese microblog entity linking combining Baidu encyclopaedia and word2vec. Procedia Computer Science, 2017, 111: 37–45. DOI: 10.1016/j.procs.2017.06.007.
    [9]
    Zeng W X, Tang J Y, Zhao X. Entity linking on Chinese microblogs via deep neural network. IEEE Access, 2018, 6: 25908–25920. DOI: 10.1109/ACCESS.2018.2833153.
    [10]
    Ma C F, Sha Y, Tan J L, Guo L, Peng H L. Chinese social media entity linking based on effective context with topic semantics. In Proc. the 43rd Annual Computer Software and Applications Conference, Jul. 2019, pp.386–395. DOI: 10.1109/COMPSAC.2019.00063.
    [11]
    Chen T Q, Guestrin C. XGBoost: A scalable tree boosting system. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp.785–794. DOI: 10.1145/2939672.2939785.
    [12]
    Moro A, Raganato A, Navigli R. Entity linking meets word sense disambiguation: A unified approach. Trans. Association for Computational Linguistics, 2014, 2: 231–244. DOI: 10.1162/tacl_a_00179.
    [13]
    Khosrovian K, Pfahl D, Garousi V. GENSIM 2.0: A customizable process simulation model for software process evaluation. In Proc. the 2008 International Conference on Software Process, May 2008, pp.294–306. DOI: 10.1007/978-3-540-79588-9_26.
    [14]
    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. DOI: 10.1162/neco.1997.9.8.1735.
    [15]
    Phan M C, Sun A X, Tay Y, Han J L, Li C L. NeuPL: Attention-based semantic matching and pair-linking for entity disambiguation. In Proc. the 2017 ACM Conference on Information and Knowledge Management, Nov. 2017, pp.1667–1676. DOI: 10.1145/3132847.3132963.
    [16]
    Zeng W X, Zhao X, Tang J Y, Tan Z, Huang X Q. CLEEK: A Chinese long-text corpus for entity linking. In Proc. the 12th Language Resources and Evaluation Conference, May 2020, pp.2026–2035. DOI: 10.1145/3132847.3132963.
    [17]
    Lei K, Zhang B, Liu Y, Deng Y, Zhang D Y, Shen Y. A knowledge graph based solution for entity discovery and linking in open-domain questions. In Proc. the 2nd International Conference on Smart Computing and Communication, Dec. 2017, pp.181–190. DOI: 10.1007/978-3-319-73830-7_19.
    [18]
    Inan E, Dikenelli O. A sequence learning method for domain-specific entity linking. In Proc. the 7th Named Entities Workshop, Jul. 2018, pp.14–21. DOI: 10.18653/v1/W18-2403.
    [19]
    Logeswaran L, Chang M W, Lee K, Toutanova K, Devlin J, Lee H. Zero-shot entity linking by reading entity descriptions. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.3449–3460. DOI: 10.18653/v1/P19-1335.
    [20]
    Chen L H, Varoquaux G, Suchanek F M. A lightweight neural model for biomedical entity linking. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(14): 12657–12665. DOI: 10.1609/aaai.v35i14.17499.
    [21]
    Dong Z D, Dong Q, Hao C L. HowNet and its computation of meaning. In Proc. the 23rd International Conference on Computational Linguistics: Demonstrations, Aug. 2010, pp.53–56. DOI: 10.5555/1944284.1944298.
    [22]
    Miller G A. WordNet: A lexical database for English. Communications of the ACM, 1995, 38(11): 39–41. DOI: 10.1145/219717.219748.
    [23]
    Pilehvar M T, Collier N. De-conflated semantic representations. In Proc. the 2016 Conference on Empirical Methods in Natural Language Processing, Nov. 2016, pp.1680–1690. DOI: 10.18653/v1/D16-1174.
    [24]
    Lee Y Y, Yen T Y, Huang H H, Shiue Y T, Chen H H. GenSense: A generalized sense retrofitting model. In Proc. the 27th International Conference on Computational Linguistics, Aug. 2018, pp.1662-1671.
    [25]
    Ramprasad S, Maddox J. CoKE: Word sense induction using contextualized knowledge embeddings. In Proc. the 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering, Mar. 2019.
    [26]
    Scarlini B, Pasini T, Navigli R. SensEmBERT: Context-enhanced sense embeddings for multilingual word sense disambiguation. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8758–8765. DOI: 10.1609/aaai.v34i05.6402.
    [27]
    Eyal M, Sadde S, Taub-Tabib H, Goldberg Y. Large scale substitution-based word sense induction. In Proc. the 60th Annual Meeting of the Association for Computational Linguistics, May 2022, pp.4738–4752. DOI: 10.18653/v1/2022.acl-long.325.
    [28]
    Neelakantan A, Shankar J, Passos A, McCallum A. Efficient non-parametric estimation of multiple embeddings per word in vector space. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1059–1069. DOI: 10.3115/v1/D14-1113.
    [29]
    Pelevina M, Arefiev N, Biemann C, Panchenko A. Making sense of word embeddings. In Proc. the 1st Workshop on Representation Learning for NLP, Aug. 2016, pp.174–183. DOI: 10.18653/v1/W16-1620.
    [30]
    Chang H S, Agrawal A, Ganesh A, Desai A, Mathur V, Hough A, McCallum A. Efficient graph-based word sense induction by distributional inclusion vector embeddings. In Proc. the 12th Workshop on Graph-Based Methods for Natural Language Processing, Jun. 2018, pp.38–48. DOI: 10.18653/v1/W18-1706.
    [31]
    Han S Z, Shirai K. Unsupervised word sense disambiguation based on word embedding and collocation. In Proc. the 13th International Conference on Agents and Artificial Intelligence, Feb. 2021, pp.1218–1225. DOI: 10.5220/0010380112181225.
    [32]
    Chen H H, Jin H. Finding and evaluating the community structure in semantic peer-to-peer overlay networks. Science China Information Sciences, 2011, 54(7): 1340–1351. DOI: 10.1007/s11432-011-4296-6.
    [33]
    Gao W, Wong K F, Xia Y Q, Xu R F. Clique percolation method for finding naturally cohesive and overlapping document clusters. In Proc. the 21st International Conference on Computer Processing of Oriental Languages, Dec. 2006, pp.97–108. DOI: 10.1007/11940098_10.
    [34]
    Gibbons T R, Mount S M, Cooper E D, Delwiche C F. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm. BMC Bioinformatics, 2015, 16: 218. DOI: 10.1186/s12859-015-0625-x.
    [35]
    Brin S, Page L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 2012, 56(18): 3825–3833. DOI: 10.1016/j.comnet.2012.10.007.
    [36]
    Yoshua B, Olivier D, Nicolas Le R. Label propagation and quadratic criterion. Semi-Supervised Learning, 2006: 192–216. DOI: 10.7551/mitpress/9780262033589.003.0011.
    [37]
    Serban O, Castellano G, Pauchet A, Rogozan A, Pecuchet J P. Fusion of smile, valence and NGram features for automatic affect detection. In Proc. the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Sept. 2013, pp.264–269. DOI: 10.1109/ACⅡ.2013.50.
    [38]
    Jin H, Zhang Z B, Yuan P P. Improving Chinese word representation using four corners features. IEEE Trans. Big Data, 2022, 8(4): 982–993. DOI: 10.1109/TBDATA.2021.3106582.
    [39]
    Huang E H, Socher R, Manning C D, Ng A Y. Improving word representations via global context and multiple word prototypes. In Proc. the 50th Annual Meeting of the Association for Computational Linguistics, Jul. 2012, pp.873-882.
    [40]
    Biemann C. Turk bootstrap word sense inventory 2.0: A large-scale resource for lexical substitution. In Proc. the 8th International Conference on Language Resources and Evaluation, May 2012, pp.4038-4042.
    [41]
    Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1532–1543. DOI: 10.3115/v1/D14-1162.
    [42]
    Ilić S, Marrese-Taylor E, Balazs J A, Matsuo Y. Deep contextualized word representations for detecting sarcasm and irony. In Proc. the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Oct. 2018, pp.2–7. DOI: 10.18653/v1/w18-6202.
    [43]
    Liu Y J, Che W X, Wang Y X, Zheng B, Qin B, Liu T. Deep contextualized word embeddings for universal dependency parsing. ACM Trans. Asian and Low-Resource Language Information Processing, 2020, 19(1): 9. DOI: 10.1145/3326497.
  • Related Articles

    [1]Yu-Feng Zhang, Wei Chen, Peng-Peng Zhao, Jia-Jie Xu, Jun-Hua Fang, Lei Zhao. Meta-Learning Based Few-Shot Link Prediction for Emerging Knowledge Graph[J]. Journal of Computer Science and Technology, 2024, 39(5): 1058-1077. DOI: 10.1007/s11390-024-2863-8
    [2]Fu-Rong Dang, Jin-Tao Tang, Kun-Yuan Pang, Ting Wang, Sha-Sha Li, Xiao Li. Constructing an Educational Knowledge Graph with Concepts Linked to Wikipedia[J]. Journal of Computer Science and Technology, 2021, 36(5): 1200-1211. DOI: 10.1007/s11390-020-0328-2
    [3]Xiang-Guang Zhou, Ren-Bin Gong, Fu-Geng Shi, Zhe-Feng Wang. PetroKG: Construction and Application of Knowledge Graph in Upstream Area of PetroChina[J]. Journal of Computer Science and Technology, 2020, 35(2): 368-378. DOI: 10.1007/s11390-020-9966-7
    [4]Ji-Zhao Zhu, Yan-Tao Jia, Jun Xu, Jian-Zhong Qiao, Xue-Qi Cheng. Modeling the Correlations of Relations for Knowledge Graph Embedding[J]. Journal of Computer Science and Technology, 2018, 33(2): 323-334. DOI: 10.1007/s11390-018-1821-8
    [5]Ze-Qi Lin, Bing Xie, Yan-Zhen Zou, Jun-Feng Zhao, Xuan-Dong Li, Jun Wei, Hai-Long Sun, Gang Yin. Intelligent Development Environment and Software Knowledge Graph[J]. Journal of Computer Science and Technology, 2017, 32(2): 242-249. DOI: 10.1007/s11390-017-1718-y
    [6]Fei Tian, Bin Gao, En-Hong Chen, Tie-Yan Liu. Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph[J]. Journal of Computer Science and Technology, 2016, 31(3): 624-634. DOI: 10.1007/s11390-016-1651-5
    [7]wang Zhenyu. ρ Graph: Rendezvous Ordering Graph forAda Concurrent Programs[J]. Journal of Computer Science and Technology, 1998, 13(6): 615-622.
    [8]Fu Yuxi. Reaction Graph[J]. Journal of Computer Science and Technology, 1998, 13(6): 510-530.
    [9]Wang Dingxing, Zheng Weimin, Du Xiaoli, Guo Yike. On the Execution Mechanisms of Parallel Graph Reduction[J]. Journal of Computer Science and Technology, 1990, 5(4): 333-346.
    [10]Li Hao, Liu Qun. A Problem of Tree Graph[J]. Journal of Computer Science and Technology, 1989, 4(1): 61-66.
  • Cited by

    Periodical cited type(7)

    1. Seungmin Choi, Yuchul Jung. Knowledge Graph Construction: Extraction, Learning, and Evaluation. Applied Sciences, 2025, 15(7): 3727. DOI:10.3390/app15073727
    2. Huayu Li, Yang Yue, Xiaojun Man, et al. Video Multimodal Entity Linking via Multi-Perspective Enhanced Subgraph Contrastive Network. International Journal of Software Engineering and Knowledge Engineering, 2024, 34(11): 1757. DOI:10.1142/S0218194024500360
    3. Li Weigang, Mayara Chew Marinho, Denise Leyi Li, et al. Six-Writings multimodal processing with pictophonetic coding to enhance Chinese language models. Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 84. DOI:10.1631/FITEE.2300384
    4. Zeang Sheng, Wentao Zhang, Yangyu Tao, et al. OUTRE: An OUT-of-Core De-REdundancy GNN Training Framework for Massive Graphs within A Single Machine. Proceedings of the VLDB Endowment, 2024, 17(11): 2960. DOI:10.14778/3681954.3681976
    5. Yongqi Shi, Ruopeng Yang, Changsheng Yin, et al. Entity Linking Method for Chinese Short Texts with Multiple Embedded Representations. Electronics, 2023, 12(12): 2692. DOI:10.3390/electronics12122692
    6. Zhaobo Zhang, Pingpeng Yuan, Hai Jin. Machine Learning and Knowledge Discovery in Databases: Research Track. Lecture Notes in Computer Science, DOI:10.1007/978-3-031-43418-1_35
    7. Mian Hu, Sheng Zheng, Xiaoyue Chen. Candidate Generation for Entity Linking on Military Equipment. 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), DOI:10.1109/AICIT59054.2023.10277709

    Other cited types(0)

Catalog

    Article views (359) PDF downloads (32) Cited by(7)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return