Imputing DNA Methylation by Transferred Learning Based Neural Network

Xin-Feng Wang1 (王新峰), Xiang Zhou1 (周翔), Jia-Hua Rao1 (饶家华), Zhu-Jin Zhang1 (张柱金), and Yue-Dong Yang1,2,* (杨跃东), Member, CCF        

  1. 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
    2Key Laboratory of Machine Intelligence and Advanced Computing of Ministry of Education (Sun Yat-sen University), Guangzhou 510000, China
研究结果表明,采用迁移学习方法利用泛癌样本间DNA甲基化的相关性,有效地解决了样本量小、维数高的问题。通过对模拟缺失DNA甲基化数据的测试,我们的模型在RMSE和R2两个指标上均一致性优于现有方法。我们进一步用于真实缺失数据的补齐,并根据补齐数据 进行生存分析,结果证实我们的模型补齐的数据质量能更好地反应患者状态。更重要的是,该模型框架并不局限癌症DNA甲基化补齐任务,未来可以进一步应用于其他组学类型、其他疾病类型、以及基于补齐结果的年龄预测和细胞分类等其它任务。

关键词: 神经网络, 迁移学习, DNA甲基化, 数据补齐, 生存分析


DNA methylation is one important epigenetic type to play a vital role in many diseases including cancers. With the development of the high-throughput sequencing technology, there is much progress to disclose the relations of DNA methylation with diseases. However, the analyses of DNA methylation data are challenging due to the missing values caused by the limitations of current techniques. While many methods have been developed to impute the missing values, these methods are mostly based on the correlations between individual samples, and thus are limited for the abnormal samples in cancers. In this study, we present a novel transfer learning based neural network to impute missing DNA methylation data, namely the TDimpute-DNAmeth method. The method learns common relations between DNA methylation from pan-cancer samples, and then fine-tunes the learned relations over each specific cancer type for imputing the missing data. Tested on 16 cancer datasets, our method was shown to outperform other commonly-used methods. Further analyses indicated that DNA methylation is related to cancer survival and thus can be used as a biomarker of cancer prognosis.

Key words: neural network, transfer learning, DNA methylation, data imputation, survival analysis

