Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (2): 234-247.doi: 10.1007/s11390-021-0851-9

Special Issue: Emerging Areas

• Special Section on AI and Big Data Analytics in Biology and Medicine • Previous Articles     Next Articles

DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning

Wei Du1, Member, CCF, IEEE, Yu Sun1, Hui-Min Bao1, Liang Chen2, Member, CCF, Ying Li1,*, Senior Member, CCF, and Yan-Chun Liang1,3,*, Senior Member, CCF        

  1. 1 Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2 Department of Computer Science, College of Engineering, Shantou University, Shantou 515063, China;
    3 Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education Zhuhai College of Jilin University, Zhuhai 519041, China
  • Received:2020-07-30 Revised:2021-02-28 Online:2021-03-05 Published:2021-04-01
  • Contact: Ying Li, Yan-Chun Liang;
  • About author:Wei Du received his Ph.D. degree in computer science and technology from Jilin University, Changchun, in 2011. He was a visiting scholar with the University of Georgia, Athens, from 2015 to 2016. He is currently an associate professor in the College of Computer Science and Technology, Jilin University, Changchun. He has published more than 40 journal and conference papers. His major research interests include bioinformatics, computational biology, and computational intelligence.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61872418, 61972174, and 62002212, the Natural Science Foundation of Jilin Province of China under Grant Nos. 20180101050JC and 20180101331JC, the Science and Technology Planning Project of Guangdong Province of China under Grant No. 2020A0505100018, and the Guangdong Key-Project for Applied Fundamental Research under Grant No. 2018KZDXM076.

The identification of blood-secretory proteins and the detection of protein biomarkers in the blood have an important clinical application value. Existing methods for predicting blood-secretory proteins are mainly based on traditional machine learning algorithms, and heavily rely on annotated protein features. Unlike traditional machine learning algorithms, deep learning algorithms can automatically learn better feature representations from raw data, and are expected to be more promising to predict blood-secretory proteins. We present a novel deep learning model (DeepHBSP) combined with transfer learning by integrating a binary classification network and a ranking network to identify blood-secretory proteins from the amino acid sequence information alone. The loss function of DeepHBSP in the training step is designed to apply descriptive loss and compactness loss to the binary classification network and the ranking network, respectively. The feature extraction subnetwork of DeepHBSP is composed of a multi-lane capsule network. Additionally, transfer learning is used to train a highly accurate generalized model with small samples of blood-secretory proteins. The main contributions of this study are as follows: 1) a novel deep learning architecture by integrating a binary classification network and a ranking network is proposed, superior to existing traditional machine learning algorithms and other state-of-the-art deep learning architectures for biological sequence analysis; 2) the proposed model for blood-secretory protein prediction uses only amino acid sequences, overcoming the heavy dependence of existing methods on annotated protein features; 3) the blood-secretory proteins predicted by our model are statistically significant compared with existing blood-based biomarkers of cancer.

Key words: blood-secretory protein; deep learning; capsule network; transfer learning;

[1] Nagpal M, Singh S, Singh P, Chauhan P, Zaidi M A. Tumor markers:A diagnostic tool. National Journal of Maxillofacial Surgery, 2016, 7(1):17-20. DOI:10.4103/0975-5950.196135.
[2] Loke S Y, Lee A S G. The future of blood-based biomarkers for the early detection of breast cancer. European Journal of Cancer, 2018, 92:54-68. DOI:10.1016/j.ejca.2017.12.025.
[3] Geyer P E, Kulak N A, Pichler G, Holdt L M, Teupser D, Mann M. Plasma proteome profiling to assess human health and disease. Cell Systems, 2016, 2(3):185-195. DOI:10.1016/j.cels.2016.02.015.
[4] Cui J, Liu Q, Puett D, Xu Y. Computational prediction of human proteins that can be secreted into the bloodstream. Bioinformatics, 2008, 24(20):2370-2375. DOI:10.1093/bioinformatics/btn418.
[5] Dhanasekaran S M, Barrette T R, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta K J, Rubin M A, Chinnaiyan A M. Delineation of prognostic biomarkers in prostate cancer. Nature, 2001, 412(6849):822-826. DOI:10.1038/35090585.
[6] Liu Q, Cui J, Yang Q, Xu Y. In-silico prediction of blood-secretory human proteins using a ranking algorithm. BMC Bioinformatics, 2010, 11:Article No. 250. DOI:10.1186/1471-2105-11-250.
[7] Robinson J L, Feizi A, Uhlén M, Nielsen J. A systematic investigation of the malignant functions and diagnostic potential of the cancer secretome. Cell Reports, 2019, 26(10):2622-2635. DOI:10.1016/j.celrep.2019.02.025.
[8] Geyer P E, Holdt L M, Teupser D, Mann M. Revisiting biomarker discovery by plasma proteomics. Molecular Systems Biology, 2017, 13(9):Article No. 942. DOI:10.15252/msb.20156297.
[9] Huang L, Shao D, Wang Y, Cui X, Li Y, Chen Q, Cui J. Human body-fluid proteome:Quantitative profiling and computational prediction. Briefings in Bioinformatics, 2021, 22(1):315-333. DOI:10.1093/bib/bbz160.
[10] Zhang J, Chai H, Guo S, Guo H, Li Y. Highthroughput identification of mammalian secreted proteins using species-specific scheme and application to human proteome. Molecules, 2018, 23(6):Article No. 1448. DOI:10.3390/molecules23061448.
[11] Zhang J, Zhang Y, Ma Z. In silico prediction of human secretory proteins in plasma based on discrete firefly optimization and application to cancer biomarkers identification. Frontiers in Genetics, 2019, 10:Article No. 542. DOI:10.3389/fgene.2019.00542.
[12] Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D. MusiteDeep:A deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics, 2017, 33(24):3909-3916. DOI:10.1093/bioinformatics/btx496.
[13] Liang H, Sun X, Sun Y, Gao Y. Text feature extraction based on deep learning:A review. EURASIP Journal on Wireless Communications and Networking, 2017, 2017:Article No. 211. DOI:10.1186/s13638-017-0993-1.
[14] Cao Z, Du W, Li G, Cao H. DEEPSMP:A deep learning model for predicting the ectodomain shedding events of membrane proteins. Journal of Bioinformatics Computational Biology, 2020, 18(3):Article No. 2050017. DOI:10.1142/S0219720020500171.
[15] Du W, Pang R, Li G, Cao H, Li Y, Liang Y. DeepUEP:Prediction of urine excretory proteins using deep learning. IEEE Access, 2020, 8:100251-100261. DOI:10.1109/ACCESS.2020.2997937.
[16] Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST:A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17):3389-3402. DOI:10.1093/nar/25.17.3389.
[17] The UniProt Consortium. UniProt:The universal protein knowledgebase. Nucleic Acids Research, 2017, 45(D1):D158-D169. DOI:10.1093/nar/gkw1099.
[18] Meinken J, Walker G, Cooper C R, Min X J. MetazSecKB:The human and animal secretome and subcellular proteome knowledgebase. Database, 2015:Article No. bav077. DOI:10.1093/database/bav077.
[19] Omenn G S. The HUPO human plasma proteome project. Proteomics Clinical Applications, 2007, 1(8):769-779. DOI:10.1002/prca.200700369.
[20] Li S J, Peng M, Li H, Liu B S, Wang C, Wu J R, Li Y X, Zeng R. Sys-BodyFluid:A systematical database for human body fluid proteome research. Nucleic Acids Research, 2009, 37(Database Issue):D907-D912. DOI:10.1093/nar/gkn849.
[21] Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT suite:A web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5):680-682. DOI:10.1093/bioinformatics/btq003.
[22] Maurer-Stroh S, Debulpaep M, Kuemmerer N et al. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nature Methods, 2010, 7(3):237-242. DOI:10.1038/nmeth.1432.
[23] Suzek B E, Wang Y, Huang H, McGarvey P B, Wu C H, the UniProt Consortium. UniRef clusters:A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015, 31(6):926-932. DOI:10.1093/bioinformatics/btu739.
[24] Magnan C N, Baldi P. SSpro/ACCpro 5:Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics, 2014, 30(18):2592-2597. DOI:10.1093/bioinformatics/btu352.
[25] Perera P, Patel V M. Learning deep features for one-class classification. IEEE Transactions on Image Processing, 2019, 28(11):5450-5463. DOI:10.1109/TIP.2019.2917862.
[26] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.3856-3866. DOI:10.5555/3294996.3295142.
[27] Li Y, Yuan Y. Convergence analysis of two-layer neural networks with ReLU activation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.597-607. DOI:10.5555/3294771.3294828.
[28] Armenteros J J A, Sønderby C K, Sønderby S K, Nielsen H, Winther O. DeepLoc:Prediction of protein subcellular localization using deep learning. Bioinformatics, 2017, 33(21):3387-3395. DOI:10.1093/bioinformatics/btx431.
[29] Wang D, Liang Y, Xu D. Capsule network for protein post-translational modification site prediction. Bioinformatics, 2019, 35(14):2386-2394. DOI:10.1093/bioinformatics/bty977.
[30] Caruana R. Learning many related tasks at the same time with backpropagation. In Proc. the 1994 International Conference on Neural Information Processing Systems, Jan. 1994, pp.657-664. DOI:10.5555/2998687.2998769.
[31] Ng H W, Nguyen V D, Vonikakis V, Winkler S. Deep learning for emotion recognition on small datasets using transfer learning. In Proc. the 2015 ACM International Conference Multimodal Interaction, Nov. 2015, pp.443-449. DOI:10.1145/2818346.2830593.
[32] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[33] Yao Y, Rosasco L, Caponnetto A. On early stopping in gradient descent learning. Constructive Approximatio, 2007, 26(2):289-315. DOI:10.1007/s00365-006-0663-2.
[34] Jurtz V I, Johansen A R, Nielsen M, Armenteros J J A, Nielsen H, Sønderby C K, Winther O, Sønderby S K. An introduction to deep learning on biological sequence data:Examples and solutions. Bioinformatics, 2017, 33(22):3685-3690. DOI:10.1093/bioinformatics/btx531.
[35] Kingma D P, Ba J. Adam:A method for stochastic optimization. arXiv:1412.6980, 2014., May 2020.
[36] Matthews B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 1975, 405(2):442-451. DOI:10.1016/0005-2795(75)90109-9.
[37] Linden A. Measuring diagnostic and predictive accuracy in disease management:An introduction to receiver operating characteristic (ROC) analysis. Journal of Evaluation in Clinical Practice, 2006, 12(2):132-139. DOI:10.1111/j.1365-2753.2005.00598.x.
[38] Savojardo C, Martelli P L, Fariselli P, Casadio R. DeepSig:Deep learning improves signal peptide detection in proteins. Bioinformatics, 2018, 34(10):1690-1696. DOI:10.1093/bioinformatics/btx818.
[39] Quang D, Xie X. DanQ:A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Research, 2016, 44(11):Article No. e107. DOI:10.1093/nar/gkw226.
[40] Du W, Sun Y, Li G, Cao H, Pang R, Li Y. CapsNet-SSP:Multilane capsule network for predicting human salivasecretory proteins. BMC Bioinformatics, 2020, 21(1):Article No. 237. DOI:10.1186/s12859-020-03579-2.
[41] Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi A H, Tanaseichuk O, Benner C, Chanda S K. Metascape provides a biologist-oriented resource for the analysis of systemslevel datasets. Nature Communications, 2019, 10(1):Article No. 1523. DOI:10.1038/s41467-019-09234-6.
[42] Emilsson V, Ilkov M, Lamb J R et al. Co-regulatory networks of human serum proteins link genetics to disease. Science, 2018, 361(6404):769-773. DOI:10.1126/science.aaq1327.
[43] Ahn S B, Sharma S, Mohamedali A et al. Potential early clinical stage colorectal cancer diagnosis using a proteomics blood test panel. Clinical Proteomics, 2019, 16:Article No. 34. DOI:10.1186/s12014-019-9255-z.
[44] Ahn J M, Sung H J, Yoon Y H, Kim B G, Yang W S, Lee C, Park H M, Kim B J, Kim B G, Lee S Y, An H J, Cho J Y. Integrated glycoproteomics demonstrates fucosylated serum paraoxonase 1 alterations in small cell lung cancer. Molecular & Cellular Proteomics, 2014, 13(1):30-48. DOI:10.1074/mcp.M113.028621.
[1] Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, and Yue-Dong Yang. Imputing DNA Methylation by Transferred Learning Based Neural Network [J]. Journal of Computer Science and Technology, 2022, 37(2): 320-329.
[2] Xin Zhang, Siyuan Lu, Shui-Hua Wang, Xiang Yu, Su-Jing Wang, Lun Yao, Yi Pan, and Yu-Dong Zhang. Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture [J]. Journal of Computer Science and Technology, 2022, 37(2): 330-343.
[3] Songjie Niu, Shimin Chen. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance [J]. Journal of Computer Science and Technology, 2021, 36(4): 778-791.
[4] Sheng-Luan Hou, Xi-Kun Huang, Chao-Qun Fei, Shu-Han Zhang, Yang-Yang Li, Qi-Lin Sun, Chuan-Qing Wang. A Survey of Text Summarization Approaches Based on Deep Learning [J]. Journal of Computer Science and Technology, 2021, 36(3): 633-663.
[5] Lan Chen, Juntao Ye, Xiaopeng Zhang. Multi-Feature Super-Resolution Network for Cloth Wrinkle Synthesis [J]. Journal of Computer Science and Technology, 2021, 36(3): 478-493.
[6] Yu-Jie Yuan, Yukun Lai, Tong Wu, Lin Gao, Li-Gang Liu. A Revisit of Shape Editing Techniques: From the Geometric to the Neural Viewpoint [J]. Journal of Computer Science and Technology, 2021, 36(3): 520-554.
[7] Jun Gao, Paul Liu, Guang-Di Liu, Le Zhang. Robust Needle Localization and Enhancement Algorithm for Ultrasound by Deep Learning and Beam Steering Methods [J]. Journal of Computer Science and Technology, 2021, 36(2): 334-346.
[8] Hua Chen, Juan Liu, Qing-Man Wen, Zhi-Qun Zuo, Jia-Sheng Liu, Jing Feng, Bao-Chuan Pang, Di Xiao. CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology [J]. Journal of Computer Science and Technology, 2021, 36(2): 347-360.
[9] Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node [J]. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126.
[10] Andrea Caroppo, Alessandro Leone, Pietro Siciliano. Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults [J]. Journal of Computer Science and Technology, 2020, 35(5): 1127-1146.
[11] Ying Li, Jia-Jie Xu, Peng-Peng Zhao, Jun-Hua Fang, Wei Chen, Lei Zhao. ATLRec: An Attentional Adversarial Transfer Learning Network for Cross-Domain Recommendation [J]. Journal of Computer Science and Technology, 2020, 35(4): 794-808.
[12] Dun Liang, Yuan-Chen Guo, Shao-Kui Zhang, Tai-Jiang Mu, Xiaolei Huang. Lane Detection: A Survey with New Results [J]. Journal of Computer Science and Technology, 2020, 35(3): 493-505.
[13] Zheng Zeng, Lu Wang, Bei-Bei Wang, Chun-Meng Kang, Yan-Ning Xu. Denoising Stochastic Progressive Photon Mapping Renderings Using a Multi-Residual Network [J]. Journal of Computer Science and Technology, 2020, 35(3): 506-521.
[14] Fu-Zhen Zhuang, Ying-Min Zhou, Hao-Chao Ying, Fu-Zheng Zhang, Xiang Ao, Xing Xie, Qing He, Hui Xiong. Sequential Recommendation via Cross-Domain Novelty Seeking Trait Mining [J]. Journal of Computer Science and Technology, 2020, 35(2): 305-319.
[15] Zhou Xu, Shuai Pang, Tao Zhang, Xia-Pu Luo, Jin Liu, Yu-Tian Tang, Xiao Yu, Lei Xue. Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning [J]. Journal of Computer Science and Technology, 2019, 34(5): 1039-1062.
Full text



[1] Li Wei;. A Structural Operational Semantics for an Edison Like Language(2)[J]. , 1986, 1(2): 42 -53 .
[2] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[3] Feng Yulin;. Recursive Implementation of VLSI Circuits[J]. , 1986, 1(2): 72 -82 .
[4] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[5] Sun Yongqiang; Lu Ruzhan; Huang Xiaorong;. Termination Preserving Problem in the Transformation of Applicative Programs[J]. , 1987, 2(3): 191 -201 .
[6] Qi Yulu;. A Systolic Approach for an Improvement of a Finite Field Multiplier[J]. , 1987, 2(4): 303 -309 .
[7] Feng Yulin;. Hierarchical Protocol Analysis by Temporal Logic[J]. , 1988, 3(1): 56 -69 .
[8] Xu Jie; Li Qingnan; Huang Shize; Xu Jiangfeng;. DFTSNA:A Distributed Fault-Tolerant Shipboard System[J]. , 1990, 5(2): 109 -116 .
[9] Zhou Di; Xu Xiangwen;. A Distributed Error Recovery Technique and Its Implementation and Application on UNIX[J]. , 1990, 5(2): 127 -138 .
[10] Li Jintao; Min Yinghua;. Product-Oriented Test-Pattern Generation for Programmable Logic Arrays[J]. , 1990, 5(2): 164 -174 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved