Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (2): 320-329.doi: 10.1007/s11390-021-1174-6

Special Issue: Artificial Intelligence and Pattern Recognition

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Imputing DNA Methylation by Transferred Learning Based Neural Network

Xin-Feng Wang1 (王新峰), Xiang Zhou1 (周翔), Jia-Hua Rao1 (饶家华), Zhu-Jin Zhang1 (张柱金), and Yue-Dong Yang1,2,* (杨跃东), Member, CCF        

  1. 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
    2Key Laboratory of Machine Intelligence and Advanced Computing of Ministry of Education (Sun Yat-sen University), Guangzhou 510000, China
  • Received:2020-11-23 Revised:2021-09-06 Accepted:2022-02-18 Online:2022-03-31 Published:2022-03-31
  • Contact: Yue-Dong Yang
  • About author:Yue-Dong Yang is a professor in the School of Computer Science and National Super Computer Center at Guangzhou, Sun Yet-sen University, Guangzhou. He received his Ph.D. degree in the computational biology from the University of Science and Technology of China (USTC), Hefei, in 2006. Dr. Yang has published more than 100 articles that have been cited more than 4,000 times, including five ESI highly cited articles. Currently his research group emphasizes on developing HPC and AI algorithms for multi-scale integration of omics data and intelligent drug design. He is also responsible for constructing the HPC platform for biomedical applications based on the Tianhe-2 supercomputer.
  • Supported by:
    This study was supported by the National Key Research and Development Program of China under Grant No. 2020YFB0204803, the National Natural Science Foundation of China under Grant No. 61772566, the Guangdong Key Field Research and Development Plan under Grant Nos. 2019B020228001 and 2018B010109006, the Introducing Innovative and Entrepreneurial Teams of Guangdong under Grant No. 2016ZT06D211, and the Guangzhou Science and Technology Research Plan under Grant No. 202007030010.

DNA methylation is one important epigenetic type to play a vital role in many diseases including cancers. With the development of the high-throughput sequencing technology, there is much progress to disclose the relations of DNA methylation with diseases. However, the analyses of DNA methylation data are challenging due to the missing values caused by the limitations of current techniques. While many methods have been developed to impute the missing values, these methods are mostly based on the correlations between individual samples, and thus are limited for the abnormal samples in cancers. In this study, we present a novel transfer learning based neural network to impute missing DNA methylation data, namely the TDimpute-DNAmeth method. The method learns common relations between DNA methylation from pan-cancer samples, and then fine-tunes the learned relations over each specific cancer type for imputing the missing data. Tested on 16 cancer datasets, our method was shown to outperform other commonly-used methods. Further analyses indicated that DNA methylation is related to cancer survival and thus can be used as a biomarker of cancer prognosis.

Key words: neural network; transfer learning; DNA methylation; data imputation; survival analysis ;

[1] Francis R C. Epigenetics: The Ultimate Mystery of Inheritance. WW Norton & Company, 2011.
[2] Ye P, Luan Y, Chen K, Liu Y, Xiao C, Xie Z. MethSMRT: An integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing. Nucleic Acids Research, 2016, 45(D1): D85-D89. DOI: 10.1093/nar/gkw950.
[3] Kulis M, Esteller M. DNA methylation and cancer. Advances in Genetics, 2010, 70(22): 27-56. DOI: 10.1016/B978-0-12-380866-0.60002-2.
[4] Gerd P. Defining driver DNA methylation changes in human cancer. International Journal of Molecular Sciences, 2018, 19(4): Article No.~1166. DOI: 10.3390/ijms19041166.
[5] Jouinot A, Assie G, Libe R et al. DNA methylation is an independent prognostic marker of survival in adrenocortical cancer. The Journal of Clinical Endocrinology & Metabolism, 2016, 102(3): 923-932. DOI: 10.1210/jc.2016-3205.
[6] Zhang G, Huang K C, Xu Z et al. Across-platform imputation of DNA methylation levels incorporating nonlocal information using penalized functional regression. Genetic Epidemiology, 2016, 40(4): 333-340. DOI: 10.1002/gepi.21969.
[7] Troyanskaya O, Cantor M, Sherlock G et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17(6): 520-525. DOI: 10.1093/bioinformatics/17.6.520.
[8] Guttorp P, Fuentes M, Sampson P. Using transforms to analyze space-time processes. In Statistical Methods for Spatio-Temporal Systems, Finkenstadt B, Held L, Isham V (eds.), CRC/Chapman, 2006, pp.77-150.
[9] Josse J, Husson F. Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique, 2012, 153(2): 77-99.
[10] Di Lena P, Sala C, Prodi A, Nardini C. Missing value estimation methods for DNA methylation data. Bioinformatics, 2019, 35(19): 3786-3793. DOI: 10.1093/bioinformatics/btz134.
[11] Stekhoven D J, Bühlmann P. MissForest-Non-parametric missing value imputation for mixed-type data. Bioinformatics, 2012, 28(1): 112-118. DOI: 10.1093/bioinformatics/btr597.
[12] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. DOI: 10.1038/nature14539.
[13] Heffernan R, Paliwal K, Lyons J et al. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Scientific Reports, 2015, 5: Article No.11476. DOI: 10.1038/srep11476.
[14] Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. Journal of Cheminformatics, 2021, 13(1): Article No.~7. DOI: 10.1186/s13321-021-00488-1.
[15] Senior A W, Evans R, Jumper J et al. Improved protein structure prediction using potentials from deep learning. Nature, 2020, 577(7792): 706-710. DOI: 10.1038/s41586-019-1923-7.
[16] Ching T, Himmelstein D S, Beaulieu-Jones B K et al. Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society Interface, 2018, 15(141): Article No.~20170387. DOI: 10.1098/rsif.2017.0387.
[17] Zheng S, Li Y, Chen S, Xu J, Yang Y. Predicting drug-protein interaction using quasi-visual question answering system. Nature Machine Intelligence, 2020, 2(2): 134-140. DOI: 10.1038/s42256-020-0152-y.
[18] Zheng S, Rao J, Zhang Z, Xu J, Yang Y. Predicting retrosynthetic reactions using self-corrected transformer neural networks. Journal of Chemical Information and Modeling, 2019, 60(1): 47-55. DOI: 10.1021/acs.jcim.9b00949.
[19] Way G P, Greene C S. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput, 2018, 23: 80-91. DOI: 10.1101/174474.
[20] Titus A J, Wilkins O M, Bobak C A, Christensen B C. Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction., Dec. 2021. DOI: 10.1101/433763.
[21] Lv X, Chen Z, Lu Y, Yang Y. An end-to-end Oxford Nanopore basecaller using convolution-augmented transformer. In Proc. the 2020 IEEE International Conference on Bioinformatics and Biomedicine, Dec. 2020, pp.337-342. DOI: 10.1109/BIBM49941.2020.9313290.
[22] Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Machine Intelligence, 2019, 1(4): 191-198. DOI: 10.1038/s42256-019-0037-0.
[23] Lopez R, Regier J, Cole M B, Jordan M I, Yosef N. Deep generative modeling for single-cell transcriptomics. Nature Methods, 2018, 15(12): 1053-1058. DOI: 10.1038/s41592-018-0229-2.
[24] Zeng Y, Zhou X, Rao J, Lu Y, Yang Y. Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network. In Proc. the 2020 IEEE International Conference on Bioinformatics and Biomedicine, Dec. 2020, pp.519-522. DOI: 10.1109/BIBM49941.2020.9313569.
[25] Zhou X, Chai H, Zeng Y, Zhao H, Luo C H, Yang Y. scAdapt: Virtual adversarial domain adaptation network for single cell RNA-seq data classification across platforms and species. Briefings in Bioinformatics, 2021, 22(6): Article No.~bbab281. DOI: 10.1093/bib/bbab281.
[26] Zhang Z, Zhao Y, Liao X et al. Deep learning in omics: A survey and guideline. Briefings in Functional Genomics, 2019, 18(1): 41-57. DOI: 10.1093/bfgp/ely030.
[27] The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature, 2020, 578(7793): 82-93. DOI: 10.1038/s41586-020-1969-6.
[28] Li Y, Wang L, Wang J, Ye J, Reddy C K. Transfer learning for survival analysis via efficient L2, 1-Norm regularized cox regression. In Proc. the 2016 IEEE International Conference on Data Mining, Dec. 2016, pp.231-240. DOI: 10.1109/ICDM.2016.0034.
[29] Yousefi S, Amrollahi F, Amgad M et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Scientific Reports, 2017, 7(1): Article No.~11707. DOI: 10.1038/s41598-017-11817-6.
[30] Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Briefings in Bioinformatics, 2016, 18(5): 761-773. DOI: 10.1093/bib/bbw063.
[31] Hoadley K A, Yau C, Wolf D M et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 2014, 158(4): 929-944. DOI: 10.1016/j.cell.2014.06.049.
[32] Zhou X, Chai H, Zhao H, Luo C H, Yang Y. Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network. GigaScience, 2020, 9(7): Article No.~giaa076. DOI: 10.1093/gigascience/giaa076.
[33] Wei L, Jin Z, Yang S, Xu Y, Zhu Y, Ji Y. TCGA-assembler 2: Software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics, 2017, 34(9): 1615-1617. DOI: 10.1093/bioinformatics/btx812.
[34] Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1-22.
[35] Van Belle V, Pelckmans K, Van Huffel S, Suykens J A. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artificial Intelligence in Medicine, 2011, 53(2): 107-118. DOI: 10.1016/j.artmed.2011.06.006.
[1] Hua-Peng Wei, Ying-Ying Deng, Fan Tang, Xing-Jia Pan, and Wei-Ming Dong. A Comparative Study of CNN- and Transformer-Based Visual Style Transfer [J]. Journal of Computer Science and Technology, 2022, 37(3): 601-614.
[2] Zheng Chen, Xiao-Nan Fang, and Song-Hai Zhang. Local Homography Estimation on User-Specified Textureless Regions [J]. Journal of Computer Science and Technology, 2022, 37(3): 615-625.
[3] Xiao-Zheng Xie, Jian-Wei Niu, Xue-Feng Liu, Qing-Feng Li, Yong Wang, Jie Han, and Shaojie Tang. DG-CNN: Introducing Margin Information into Convolutional Neural Networks for Breast Cancer Diagnosis in Ultrasound Images [J]. Journal of Computer Science and Technology, 2022, 37(2): 277-294.
[4] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[5] Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen. Pre-Train and Learn: Preserving Global Information for Graph Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(6): 1420-1430.
[6] Yi Zhong, Jian-Hua Feng, Xiao-Xin Cui, Xiao-Le Cui. Machine Learning Aided Key-Guessing Attack Paradigm Against Logic Block Encryption [J]. Journal of Computer Science and Technology, 2021, 36(5): 1102-1117.
[7] Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. Area Efficient Pattern Representation of Binary Neural Networks on RRAM [J]. Journal of Computer Science and Technology, 2021, 36(5): 1155-1166.
[8] Shao-Jie Qiao, Guo-Ping Yang, Nan Han, Hao Chen, Fa-Liang Huang, Kun Yue, Yu-Gen Yi, Chang-An Yuan. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network [J]. Journal of Computer Science and Technology, 2021, 36(4): 762-777.
[9] Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791.
[10] Chen-Chen Sun, De-Rong Shen. Mixed Hierarchical Networks for Deep Entity Matching [J]. Journal of Computer Science and Technology, 2021, 36(4): 822-838.
[11] Yang Liu, Ruili He, Xiaoqian Lv, Wei Wang, Xin Sun, Shengping Zhang. Is It Easy to Recognize Baby's Age and Gender? [J]. Journal of Computer Science and Technology, 2021, 36(3): 508-519.
[12] Wei Du, Yu Sun, Hui-Min Bao, Liang Chen, Ying Li, Yan-Chun Liang. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning [J]. Journal of Computer Science and Technology, 2021, 36(2): 234-247.
[13] Yang-Jie Cao, Shuang Wu, Chang Liu, Nan Lin, Yuan Wang, Cong Yang, Jie Li. Seg-CapNet: A Capsule-Based Neural Network for the Segmentation of Left Ventricle from Cardiac Magnetic Resonance Imaging [J]. Journal of Computer Science and Technology, 2021, 36(2): 323-333.
[14] Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang, Qing Shen. A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(2): 434-444.
[15] Bo-Wei Zou, Rong-Tao Huang, Zeng-Zhuang Xu, Yu Hong, Guo-Dong Zhou. Language Adaptation for Entity Relation Classification via Adversarial Neural Networks [J]. Journal of Computer Science and Technology, 2021, 36(1): 207-220.
Full text



[1] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[2] Zhang Fuyan; Cai Shijie; Wang Shu; Ge Ruding;. The Human-Computer Dialogue Management of FCAD System[J]. , 1988, 3(3): 221 -227 .
[3] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[4] Pong Man-Chi; Zhang Yongguang; Xu Hong; Ding Jie;. OOMMS:A Module Management System Based on an Object-Oriented Model[J]. , 1993, 8(2): 76 -85 .
[5] Zhang Bo; Zhang Ling;. On Memory Capacity of the Probabilistic Logic Neuron Network[J]. , 1993, 8(3): 62 -66 .
[6] Gu Junzhong;. Modelling Enterprises with Object-Oriented Paradigm[J]. , 1993, 8(3): 80 -89 .
[7] Chen Xiexiong; Wu Haomin;. The Mapping Synthesis of Ternary Functions under Fixed Polarities[J]. , 1993, 8(4): 70 -75 .
[8] Ying Mingsheng;. Institutions of Variable Truth Values:An Approach in the Ordered Style[J]. , 1995, 10(3): 267 -273 .
[9] Qu Yunyao; Tian Zengping; Wang Yuun; Shi Baile;. Design and Implementation of a Concurrency Control Mechanism in an Object-Oriented Database System[J]. , 1996, 11(4): 337 -246 .
[10] Shuai Dianxun;. Asynchronous Superimposition Mechanismsof Concurrent Competitve Waves forHyper-Distributed Hyper-Parallel HeuristicProblem Solving[J]. , 1997, 12(4): 330 -336 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved