计算机科学技术学报 ›› 2021,Vol. 36 ›› Issue (2): 310-322.doi: 10.1007/s11390-021-0844-8

所属专题: Emerging Areas

• • 上一篇    下一篇

基于软正则化的协同矩阵分解在药物-靶标相互作用预测中的应用

Li-Gang Gao1,2, Meng-Yun Yang1,2,3, and Jian-Xin Wang1,2,*, Senior Member, CCF, IEEE, Member, ACM   

  1. 1 School of Computer Science and Engineering, Central South University, Changsha 410083, China;
    2 Hunan Provincial Key Laboratory of Bioinformatics, Central South University, Changsha 410083, China;
    3 School of Science, Shaoyang University, Shaoyang 422000, China
  • 收稿日期:2020-07-29 修回日期:2021-03-09 出版日期:2021-03-05 发布日期:2021-04-01
  • 通讯作者: Jian-Xin Wang E-mail:jxwang@mail.csu.edu.cn
  • 作者简介:Li-Gang Gao received his B.S. degree in information and computational science from the University of South China, Hengyang, in 2018. He is a graduate student at the School of Computer Science and Engineering, Central South University, Changsha. His main research interests include drug-target interactions prediction and recommendation systems.
  • 基金资助:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61972423, and Hunan Provincial Science and Technology Program under Grant No. 2018wk4001.

Collaborative Matrix Factorization with Soft Regularization for Drug-Target Interaction Prediction

Li-Gang Gao1,2, Meng-Yun Yang1,2,3, and Jian-Xin Wang1,2,*, Senior Member, CCF, IEEE, Member, ACM        

  1. 1 School of Computer Science and Engineering, Central South University, Changsha 410083, China;
    2 Hunan Provincial Key Laboratory of Bioinformatics, Central South University, Changsha 410083, China;
    3 School of Science, Shaoyang University, Shaoyang 422000, China
  • Received:2020-07-29 Revised:2021-03-09 Online:2021-03-05 Published:2021-04-01
  • Contact: Jian-Xin Wang E-mail:jxwang@mail.csu.edu.cn
  • About author:Li-Gang Gao received his B.S. degree in information and computational science from the University of South China, Hengyang, in 2018. He is a graduate student at the School of Computer Science and Engineering, Central South University, Changsha. His main research interests include drug-target interactions prediction and recommendation systems.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61972423, and Hunan Provincial Science and Technology Program under Grant No. 2018wk4001.

识别潜在的药物-靶标相互作用(DTI)是药物发现的关键。基于协同过滤的矩阵分解方法凭借其天然的降维和挖掘潜在特征的属性在药物重定位和DTI预测中得到了广泛的应用。然而,基于协同矩阵分解的模型仅仅是简单的令相似性数据与DTI数据的特征相等,没有准确地表示特征之间的关系。为了合理地表示数据特征之间的相关性,我们提出了一种新的矩阵分解方法,即软正则化协同矩阵分解(SRCMF)。SRCMF通过将药物和靶标的相似性信息融入矩阵分解模型来提高预测性能。它的基本思想是通过引入软正则化项来约束DTI潜在特征和相似性数据的特征尽可能相近,而不是完全相等。具体来说,SRCMF利用软正则化项对药物(靶标)相似性特征与DTI的药物(靶标)潜在特征之间的相等关系进行松弛,从而更合理的表示特征之间的关系。为了综合评估SRCMF的预测性能,本文在三种不同的预测任务设置下进行了十倍交叉验证实验,并给出了对应的AUPR和F1值。通过对比当前六种最先进的DTI预测方法,SRCMF在三种预测任务中都表现出了更好的预测性能。为了验证SRCMF在噪声数据下的鲁棒性,本文在相似性数据中添加了高斯噪声来进行DTI预测实验。实验结果表明,在不同的数据噪声水平中,SRCMF的预测性能和鲁棒性都优于对比的预测方法。此外,为了验证SRCMF在实际药物发现中的效果,本文进行了案例分析来验证预测的潜在DTI。研究结果显示,SRCMF在GPCR数据集中预测的前十个得分最高的DTI中有六个在实践中得到验证。这进一步说明了SRCMF在DTI预测中是有效的。

关键词: 药物-靶标相互作用, 协同矩阵分解, 软正则化, 噪声数据

Abstract: Identifying the potential drug-target interactions (DTI) is critical in drug discovery. The drug-target interaction prediction methods based on collaborative filtering have demonstrated attractive prediction performance. However, many corresponding models cannot accurately express the relationship between similarity features and DTI features. In order to rationally represent the correlation, we propose a novel matrix factorization method, so-called collaborative matrix factorization with soft regularization (SRCMF). SRCMF improves the prediction performance by combining the drug and the target similarity information with matrix factorization. In contrast to general collaborative matrix factorization, the fundamental idea of SRCMF is to make the similarity features and the potential features of DTI approximate, not identical. Specifically, SRCMF obtains low-rank feature representations of drug similarity and target similarity, and then uses a soft regularization term to constrain the approximation between drug (target) similarity features and drug (target) potential features of DTI. To comprehensively evaluate the prediction performance of SRCMF, we conduct cross-validation experiments under three different settings. In terms of the area under the precision-recall curve (AUPR), SRCMF achieves better prediction results than six state-of-the-art methods. Besides, under different noise levels of similarity data, the prediction performance of SRCMF is much better than that of collaborative matrix factorization. In conclusion, SRCMF is robust leading to performance improvement in drug-target interaction prediction.

Key words: drug-target interaction, collaborative matrix factorization, soft regularization, noisy data

[1] Morgan S, Grootendorst P, Lexchin J, Cunningham C, Greyson D. The cost of drug development:A systematic review. Health Policy, 2011, 100(1):4-17. DOI:10.1016/j.healthpol.2010.12.002.
[2] Pushpakom S P, Iorio F, Eyers P A et al. Drug repurposing:Progress, challenges and recommendations. Nature Reviews Drug Discovery, 2019, 18(1):41-58. DOI:10.1038/nrd.2018.168.
[3] Drews J. Drug discovery:A historical perspective. Science, 2000, 287(5460):1960-1964. DOI:10.1126/science.287.5460.1960.
[4] Mohs R C, Greig N H. Drug discovery and development:Role of basic biological research. Alzheimer's & Dementia:Translational Research & Clinical Interventions, 2017, 3(4):651-657. DOI:10.1016/j.trci.2017.10.005.
[5] Wang Y, Bryant S H, Cheng T, Wang J, Gindulyte A, Shoemaker B, Thiessen P, He S, Zhang J. PubChem BioAssay:2017 update. Nucleic Acids Research, 2017, 45(D1):D955-D963. DOI:10.1093/nar/gkw1118.
[6] Whitebread S, Hamon J, Bojanic D, Urban L. Keynote review:in vitro safety pharmacology profiling:An essential tool for successful drug development. Drug Discovery Today, 2005, 10(21):1421-1433. DOI:10.1016/S1359-6446(05)03632-9.
[7] Keiser M, Roth B, Armbruster N, Ernsberger P, Irwin J, Shoichet B. Relating protein pharmacology by ligand chemistry. Nature Biotechnology, 2007, 25(2):197-206. DOI:10.1038/nbt1284.
[8] Li H, Gao Z, Kang L, Zhang H. TarFisDock:A web server for identifying drug targets with docking approach. Nucleic Acids Research, 2006, 34(2):W219-W224. DOI:10.1093/nar/gkl114.
[9] Ezzat A, Wu M, Li X, Kwoh C. Computational prediction of drug-target interactions using chemogenomic approaches:An empirical survey. Briefings in Bioinformatics, 2019, 20(4):1337-1357. DOI:10.1093/bib/bby002.
[10] Bleakley K, Yamanishi Y. Supervised prediction of drugtarget interactions using bipartite local models. Bioinformatics, 2009, 25(18):2397-2403. DOI:10.1093/bioinformatics/btp433.
[11] Mei J, Kwoh C, Yang P, Li X, Zheng J. Drug-target interaction prediction by learning from local information and neighbors. Bioinformatics, 2013, 29(2):238-245. DOI:10.1093/bioinformatics/bts670.
[12] Twan V, Elena M, Peter C. Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile. PLoS ONE, 2013, 8(6):Article No. e66952. DOI:10.1371/journal.pone.0066952.
[13] Yan C, Wang J, Lan W et al. SDTRLS:Predicting drugtarget interactions for complex diseases based on chemical substructures. Complexity, 2017, 2017:Article No. 2713280. DOI:10.1155/2017/2713280.
[14] Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. In Proc. the 2013 Pacific Symposium on Biocomputing, January 2013, pp.53-64. DOI:10.1142/97898144479730006.
[15] Wang L, You Z, Chen X. RFDT:A rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Current Protein and Peptide Science, 2018, 19(5):445-454. DOI:10.2174/1389203718666161114111656.
[16] Olayan R, Ashoor H, Bajic V. DDR:Efficient computational method to predict drug-target interactions using graph mining and machine learning approaches. Bioinformatics, 2018, 34(7):1164-1173. DOI:10.1093/bioinformatics/btx731.
[17] Huang Y, You Z, Chen X. A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Current Protein and Peptide Science, 2018, 19(5):468-478. DOI:10.2174/1389203718666161122103057.
[18] Yang M, Li Y, Wang J. Feature and nuclear norm minimization for matrix completion. IEEE Transactions on Knowledge and Data Engineering. DOI:10.1109/TKDE.2020.3005978.
[19] Yang M, Luo H, Li Y, Wu F X, Wang J. Overlap matrix completion for predicting drug-associated indications. PLoS Computational Biology, 2019, 15(12):Article No. e1007541. DOI:10.1371/journal.pcbi.1007541.
[20] Yang M, Luo H, Li Y, Wang J. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics, 2019, 35(14):i455-i463. DOI:10.1093/bioinformatics/btz331.
[21] Gönen M. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics, 2012, 28(18):2304-2310. DOI:10.1093/bioinformatics/bts360.
[22] Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2013, pp.1025-1033. DOI:10.1145/2487575.2487670.
[23] Liu Y, Wu M, Miao C, Zhao P, Li X. Neighborhood regularized logistic matrix factorization for drugtarget interaction prediction. PLoS Computational Biology, 2016, 12(2):Article No. e1004760. DOI:10.1371/journal.pcbi.1004760.
[24] Ezzat A, Zhao P, Wu M, Li X, Kwoh C. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, 14(3):646-656. DOI:10.1109/TCBB.2016.2530062.
[25] Mohamed S, Nováek V, Nounu A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics, 2020, 36(2):603-610. DOI:10.1093/bioinformatics/btz600.
[26] Thafar M, Olayan R, Ashoor H et al. DTiGEMS+:Drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques. Journal of Cheminformatics, 2020, 12(1):Article No. 44. DOI:10.1186/s13321-020-00447-2.
[27] Wang B, Mezlini A, Demir F et al. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 2014, 11(3):333-337. DOI:10.1038/nmeth.2810.
[28] Yang M, Wu G, Zhao Q, Li Y, Wang J. Computational drug repositioning based on multi-similarities bilinear matrix factorization. Briefings in Bioinformatics. DOI:10.1093/bib/bbaa267.
[29] Zhang L, Zhang S. A general joint matrix factorization framework for data integration and its systematic algorithmic exploration. IEEE Transactions on Fuzzy Systems, 2020, 28(9):1971-1983. DOI:10.1109/TFUZZ.2019.2928518.
[30] Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 2008, 24(13):i232-i240. DOI:10.1093/bioinformatics/btn162.
[31] Kanehisa M, Goto S, Hattori M, Aoki-Knoshita K, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics:New developments in KEGG. Nucleic Acids Research, 2006, 34:D354-D357. DOI:10.1093/nar/gkj102.
[32] Schomburg I, Chang A, Ebeling C et al. BRENDA, the enzyme database:Updates and major new developments. Nucleic Acids Research, 2004, 32(suppl 1):D431-D433. DOI:10.1093/nar/gkh081.
[33] Günther S, Kuhn M, Dunkel M et al. SuperTarget and matador:Resources for exploring drug-target relationships. Nucleic Acids Research, 2008, 36(suppl 1):D919-D922. DOI:10.1093/nar/gkm862.
[34] Wishart D, Knox C, Guo A, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank:A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research, 2008, 36(suppl 1):D901-D906. DOI:0.1093/nar/gkm958.
[35] Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. Journal of the American Chemical Society, 2003, 125(39):11853-11865. DOI:10.1021/ja036030u.
[36] Gaulton A, Bellis L J, Bento A P et al. ChEMBL:A large-scale bioactivity database for drug discovery. Nucleic Acids Research, 2012, 40(D1):D1100-D1107. DOI:10.1093/nar/gkr777.
[37] Sayers E W, Agarwala R, Bolton E E et al. Database resources of the national center for biotechnology information. Nucleic Acids Research, 2019, 47(D1):D23-D28. DOI:10.1093/nar/gky1069.
[38] Gürgen S G, Yazıcı G N, Gözükara C et al. Metoclopramide use to induce lactation can alter BDNF and DRD2 in the prefrontal cortex of offspring. Journal of Chemical Neuroanatomy, 2020, 109:Article No. 101844. DOI:10.1016/j.jchemneu.2020.101844.
[39] Naveen M, Patil A N, Pattanaik S et al. ABCB1 and DRD3 polymorphism as a response predicting biomarker and tool for pharmacogenetically guided clozapine dosing in Asian Indian treatment resistant schizophrenia patients. Asian Journal of Psychiatry, 2020, 48:Article No. 101918. DOI:10.1016/j.ajp.2019.101918.
[1] Kiatichai Treerattanapitak and Chuleerat Jaruskulchai. 可能性指数模糊聚类研究[J]. , 2013, 28(2): 311-321.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 闵应骅; 韩智德;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[2] 李明慧;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .
[3] 韩启龙; 陆汝占; 孙永强;. An Improved Bottom-up Method for Implementing Equational Programming Language[J]. , 1994, 9(1): 63 -69 .
[4] 王献昌; 陈火旺; 赵沁平;. On the Relationship Between TMS and Logic Programs[J]. , 1994, 9(3): 245 -251 .
[5] 李德毅;. Knowledge Representation in KDD Based on Linguistic Atoms[J]. , 1997, 12(6): 481 -496 .
[6] 章寅; 许卓群;. Concurrent Manipulation of Expanded AVL Trees[J]. , 1998, 13(4): 325 -336 .
[7] 周巢尘;. An Overview of Duration Calculus[J]. , 1998, 13(6): 552 .
[8] 齐越胜; 王保中; 康立山;. Genetic Programming with Simple Loops[J]. , 1999, 14(4): 429 -433 .
[9] 魏晓辉; 鞠九滨;. SFT: A Consistent Checkpointing Algorithm with Short Freezing Time[J]. , 2000, 15(2): 169 -175 .
[10] 周傲英; 金文; 周水庚; 钱卫宁; 田增平;. Incremental Mining of the Schema of Semistructured Data[J]. , 2000, 15(3): 241 -248 .
版权所有 © 《计算机科学技术学报》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
总访问量: