基于改进的加权带偏置的元结构来预测环状RNA与疾病的关系
Predicting CircRNA-Disease Associations Based on Improved Weighted Biased Meta-Structure
-
摘要: 1、研究背景。环状RNA(circRNA)是一种拥有独特性质和多种功能的特殊的内源性非编码RNA。近年来,随着高通量测序技术的快速发展,在古生菌、植物和动物中均检测了circRNA,这种现象引起了学者的广泛关注,随着circRNA数据的增加,circRNA的部分生物学功能也逐渐清晰,如充当miRNA分子海绵、参与转录调控、联结RNA结合蛋白、发挥翻译功能等等。因此,circRNA失调将会导致细胞功能紊乱、表达异常和生长缺陷等。经研究发现,多种circRNA已经被确认与胃癌、结直肠癌、肝癌、神经胶质瘤等复杂疾病的产生与发展存在着重要的联系。
2、目的:通过生物实验来预测circRNA-疾病关联关系的方法需要耗费大量的财力与时间,该方向的发展因此受到了很大的限制。基于目前多种数据库和其它RNA研究的进展,计算方法可以解决生物实验方法成本过高等问题。考虑到目前已证实的circRNA和疾病关联数量相对较少的困境,我们提出了一个新的计算模型,利用较少的circRNA和疾病关联,更多的circRNA生物学信息来发现未知的circRNA和疾病关联。
3、方法:我们提出了基于改进的加权带偏置的元结构算法预测circRNA与疾病关系模型。第一步:为了扩大circRNA的数量,我们从exoRBase获得了1511个circRNA的表达谱。接下来,我们从CircR2Disease、CircAtlas 2.0、Circ2Disease和CircRNADisease四个数据库中获得了1511个circRNA和疾病的已知关系。第二步:计算了circRNA的表达谱相似性、序列相似性和高斯核相似性以及疾病的语义相似性和高斯核相似性。第三步:将circRNA和疾病的关系网络、circRNA整合后相似性网络和疾病整合后相似性网络相结合,构建异构网络。第四步:在异构网络上采用改进的加权带偏置的元结构算法预测circRNA与疾病关系。
4、结果:通过留一交叉验证、10折交叉验证、5折交叉验证,我们的模型得到的ROC曲线下面积(AUC)分别为0.9216、0.9172和0.9005。此外,CDWBMS在准确率(0.86),F1-score(0.88)和Matthews相关系数(0.727)方面表现良好。通过对胃癌、结直肠癌和乳腺癌进行案例研究,表明CDWBMS可以预测未知的circRNA疾病关联。
5、结论:基于目前较少的circRNA和疾病关联数量,我们的模型扩展了circRNA的数量,为探索新的未知circRNA与疾病之间的关系提供了更多空间。另外,通过对元结构算法的改进,使得算法的预测性能得到进一步提升。然而,与疾病相关的circRNAs,大部分往往没有表达谱数据,因此,我们将进一步整合circRNA与疾病的关联关系数据,结合circRNA的生物学数据,提高该模型的有效性。Abstract: Circular RNAs (circRNAs) are RNAs with a special closed loop structure, which play important roles in tumors and other diseases. Due to the time consumption of biological experiments, computational methods for predicting associations between circRNAs and diseases become a better choice. Taking the limited number of verified circRNA-disease associations into account, we propose a method named CDWBMS, which integrates a small number of verified circRNA-disease associations with a plenty of circRNA information to discover the novel circRNA-disease associations. CDWBMS adopts an improved weighted biased meta-structure search algorithm on a heterogeneous network to predict associations between circRNAs and diseases. In terms of leave-one-out-cross-validation (LOOCV), 10-fold cross-validation and 5-fold cross-validation, CDWBMS yields the area under the receiver operating characteristic curve (AUC) values of 0.921 6, 0.917 2 and 0.900 5, respectively. Furthermore, case studies show that CDWBMS can predict unknow circRNA-disease associations. In conclusion, CDWBMS is an effective method for exploring disease-related circRNAs.