Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (5): 1200-1211.doi: 10.1007/s11390-020-0328-2

Special Issue: Artificial Intelligence and Pattern Recognition

• Regular Paper • Previous Articles     Next Articles

Constructing an Educational Knowledge Graph with Concepts Linked to Wikipedia

Fu-Rong Dang1, +, Jin-Tao Tang1, +, Senior Member, CCF, Kun-Yuan Pang1 Ting Wang1,*, Senior Member, CCF, Sha-Sha Li1, and Xiao Li2        

  1. 1 College of Computer Science and Technology, National University of Defense Technology, Hunan 410073, China;
    2 Information Center, National University of Defense Technology, Hunan 410073, China
  • Received:2020-01-11 Revised:2020-08-18 Online:2021-09-30 Published:2021-09-30
  • About author:Fu-Rong Dang received her M.S. degree in software engineering from National University of Defense Technology, Changsha, in 2019. Her research interests include natural language processing and knowledge graph.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China under Grant No. 2018YFB1004502, and the National Natural Science Foundation of China under Grant Nos. 61532001, 61702532 and 61303190.

To use educational resources efficiently and dig out the nature of relations among MOOCs (massive open online courses), a knowledge graph was built for MOOCs on four major platforms:Coursera, EDX, XuetangX, and ICourse. This paper demonstrates the whole process of educational knowledge graph construction for reference. And this knowledge graph, the largest knowledge graph of MOOC resources at present, stores and represents five classes, 11 kinds of relations and 52 779 entities with their corresponding properties, amounting to more than 300 000 triples. Notably, 24 188 concepts are extracted from text attributes of MOOCs and linked them directly with corresponding Wikipedia entries or the closest entries calculated semantically, which provides the normalized representation of knowledge and a more precise description for MOOCs far more than enriching words with explanatory links. Besides, prerequisites discovered by direct extractions are viewed as an essential supplement to augment the connectivity in the knowledge graph. This knowledge graph could be considered as a collection of unified MOOC resources for learners and the abundant data for researchers on MOOC-related applications, such as prerequisites mining.

Key words: concept extraction; educational resource; knowledge graph; massive open online course (MOOC); prerequisite;

[1] Almeda M V, Zuech J, Utz C, Higgins G, Reynolds R, Baker R S. Comparing the factors that predict completion and grades among for-credit and open/MOOC students in online learning. Journal of Interactive Online Learning, 2018, 22(1):1-18. DOI:10.24059/olj.v22i1.1060.
[2] Onah D F O, Sinclair J, Boyatt R. Dropout rates of massive open online courses:Behavioural patterns. In Proc. the 6th International Conference on Education and New Learning Technologies, July 2014, pp.5825-5834. DOI:10.13140/RG.2.1.2402.0009.
[3] Gardner J, Brooks C. Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 2018, 28(2):127-203. DOI:10.1007/s11257-018-9203-z.
[4] Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In Proc. the 29th AAAI Conference on Artificial Intelligence, January 2015, pp.2181-2187.
[5] Wang X, Feng W, Tang J, Zhong Q. Course concept extraction in MOOC via explicit/implicit representation. In Proc. the 3rd IEEE International Conference on Data Science in Cyberspace, June 2018, pp.339-345. DOI:10.1109/DSC.2018.00055.
[6] Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C. DBpedia-A largescale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6(2):167-195. DOI:10.3233/SW-140134.
[7] Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D. Learning a health knowledge graph from electronic medical records. Scientific Reports, 2017, 7(1):Article No. 5994. DOI:10.1038/s41598-017-05778-z.
[8] Miao Q, Meng Y, Zhang B. Chinese enterprise knowledge graph construction based on linked data. In Proc. the 9th IEEE International Conference on Semantic Computing, February 2015, pp.153-154. DOI:10.1109/ICOSC.2015.7050795.
[9] Szekely P, Knoblock C A, Slepicka J et al. Building and using a knowledge graph to combat human trafficking. In Proc. the 14th International Semantic Web Conference, October 2015, pp.205-221. DOI:10.1007/978-3-319-25010-612.
[10] Qiu B S, Zhao W. Student model in adaptive learning system based on semantic web. In Proc. the 1st International Workshop on Education Technology and Computer Science, March 2019, pp.909-913. DOI:10.1109/ETCS.2009.466.
[11] Chaplot D S, Yang Y, Carbonell J, Koedinger K R. Data-driven automated induction of prerequisite structure graphs. In Proc. the 9th International Conference on Educational Data Mining, June 2016, pp.318-323.
[12] Sun K, Liu Y, Guo Z, Wang C. EduVis:Visualization for education knowledge graph based on web data. In Proc. the 9th International Symposium on Visual Information Communication and Interactions, September 2016, pp.138-139. DOI:10.1145/2968220.2968227.
[13] Chen P, Lu Y, Zheng V W, Chen X, Yang B. KnowEdu:A system to construct knowledge graph for education. IEEE Access, 2018, 6:31553-31563. DOI:10.1109/ACCESS.2018.2839607.
[14] Alsaad F, Boughoula A, Geigle C, Sundaram H, Zhai C. Mining MOOC lecture transcripts to construct concept dependency graphs. In Proc. the 11th International Conference on Educational Data Mining, July 2018.
[15] Yang Y, Liu H, Carbonell J, Ma W. Concept graph learning from educational data. In Proc. the 8th ACM International Conference on Web Search & Data Mining, February 2015, pp.159-168. DOI:10.1145/2684822.2685292.
[16] Duan Y, Shao L, Hu G, Zhou Z, Zou Q, Lin Z. Specifying architecture of knowledge graph with data graph, information graph, knowledge graph and wisdom graph. In Proc. the 15th IEEE International Conference on Software Engineering Research, Management and Applications, June 2017, pp.327-332. DOI:10.1109/SERA.2017.7965747.
[17] Francesconi E, Montemagni S, Peters W, Tiscornia D. Integrating a bottom-up and top-down methodology for building semantic resources for the multilingual legal domain. In Semantic Processing of Legal Texts:Where the Language of Law Meets the Law of Language, Francesconi E, Montemagni S, Peters W, Tiscornia D (eds.), Springer, 2010, pp.95-121. DOI:10.1007/978-3-642-12837-06.
[18] Li J, Jun Z, Chen H, Liu Z, Sun L, Hou L, Xu B, Peng P. Knowledge mapping development report. Technical Report, Chinese Information Society Language and Knowledge Computing Committee, 2018., June 2020. (in Chinese)
[19] Xie G. Review of knowledge graph refinement. Application of Electronic Technique, 2018, 44(9):29-33. DOI:10.16157/j.issn.0258-7998.180696. (in Chinese)
[20] Adelberg B. NoDoSE-A tool for semi-automatically extracting structured and semi-structured data from text documents. In Proc. the 1998 ACM SIGMOD International Conference on Management of Data, June 1998, pp.283-294. DOI:10.1145/276304.276330.
[21] Ching N. The Best MOOC Platforms. 2018., October 2019.
[22] Seaton D T, Bergner Y, Chuang I, Mitros P, Pritchard D E. Who does what in a massive open online course? Communications of the ACM, 2013, 57(4):58-65. DOI:10.1145/2500876.
[23] Mesbah S, Chen G, Torre M V, Bozzon A, Lofi C, Houben G J. Concept focus:Semantic meta-data for describing MOOC content. In Proc. the 13th European Conference on Technology Enhanced Learning, September 2018, pp.467-481. DOI:10.1007/978-3-319-98572-536.
[24] Effland T D. Focused mining of university course descriptions from highly variable sources. In Proc. the 46th ACM Technical Symposium on Computer Science Education, March 2015, Article No. 716. DOI:10.1145/2676723.2693630.
[25] Atapattu T, Falkner K, Falkner N. A comprehensive text analysis of lecture slides to generate concept maps. Computers & Education, 2017, 115:96-113. DOI:10.1016/j.compedu.2017.08.001.
[26] Wang S, Ororbia A, Wu Z, Williams K, Liang C, Pursel B, Giles C L. Using prerequisites to extract concept maps from textbooks. In Proc. the 25th ACM International Conference on Information and Knowledge Management, October 2016, pp.317-326. DOI:10.1145/2983323.2983725.
[27] Pan L, Wang X, Li C, Li J, Tang J. Course concept extraction in MOOCs via embedding-based graph propagation. In Proc. the 8th International Joint Conference on Natural Language Processing, November 27-December 1, 2017, pp.875-884.
[28] Yu J, Wang C, Luo G, Hou L, Li J, Liu Z, Tang J. Course concept expansion in MOOCs with external knowledge and interactive game. arXiv:1909.07739, 2019., June 2020.
[29] Krishnan A, Sankar A, Zhi S, Han J. Unsupervised concept categorization and extraction from scientific document titles. In Proc. the 2017 ACM Conference on Information and Knowledge Management, November 2017, pp.1339-1348. DOI:10.1145/3132847.3133023.
[30] Pang K, Tang J, Wang T. Which embedding level is better for semantic representation? An empirical research on Chinese phrases. In Proc. the 7th CCF International Conference on Natural Language Processing and Chinese Computing, August 2018, pp.54-66. DOI:10.1007/978-3-319-99501-45.
[31] Liang C, Wu Z, Huang W, Giles C L. Measuring prerequisite relations among concepts. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1668-1674. DOI:10.18653/v1/D15-1193.
[32] Mikolov T, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.3111-3119.
[33] Gresch H, B?geholz S. Identifying non-sustainable courses of action:A prerequisite for decision-making in education for sustainable development. Research in Science Education, 2013, 43(2):733-754. DOI:10.1007/s11165-012-9287-0.
[34] Vuong A, Nixon T, Towle B. A method for finding prerequisites within a curriculum. In Proc. the 4th International Conference on Educational Data Mining, July 2011, pp.211-216.
[35] Dong X L, Gabrilovich E, Heitz G, Horn W, Murphy K, Sun S, Zhang W. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10):881-892. DOI:10.14778/2732951.2732962.
[36] Zhong H, Zhang J, Wang Z, Wan H, Chen Z. Aligning knowledge and text embeddings by entity descriptions. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.267-272. DOI:10.18653/v1/D15-1031.
[37] Zheng Y, Liu R, Hou J. The construction of high educational knowledge graph based on MOOC. In Proc. the 2nd IEEE Information Technology, Networking, Electronic and Automation Control Conference, December 2017, pp.260-263. DOI:10.1109/ITNEC.2017.8284984.
[1] Xiang-Guang Zhou, Ren-Bin Gong, Fu-Geng Shi, Zhe-Feng Wang. PetroKG: Construction and Application of Knowledge Graph in Upstream Area of PetroChina [J]. Journal of Computer Science and Technology, 2020, 35(2): 368-378.
[2] Ji-Zhao Zhu, Yan-Tao Jia, Jun Xu, Jian-Zhong Qiao, Xue-Qi Cheng. Modeling the Correlations of Relations for Knowledge Graph Embedding [J]. , 2018, 33(2): 323-334.
[3] Ze-Qi Lin, Bing Xie, Yan-Zhen Zou, Jun-Feng Zhao, Xuan-Dong Li, Jun Wei, Hai-Long Sun, Gang Yin. Intelligent Development Environment and Software Knowledge Graph [J]. , 2017, 32(2): 242-249.
[4] Fei Tian, Bin Gao, En-Hong Chen, Tie-Yan Liu. Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph [J]. , 2016, 31(3): 624-634.
[5] Chaveevan Pechsiri and Rapepun Piriyakul. Explanation Knowledge Graph Construction Through Causality Extraction from Texts [J]. , 2010, 25(5): 1055-1070.
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[5] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[10] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved