Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (2): 234-247.doi: 10.1007/s11390-021-0851-9

Special Issue: Emerging Areas

• Special Section on AI and Big Data Analytics in Biology and Medicine • Previous Articles     Next Articles

DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning

Wei Du1, Member, CCF, IEEE, Yu Sun1, Hui-Min Bao1, Liang Chen2, Member, CCF, Ying Li1,*, Senior Member, CCF, and Yan-Chun Liang1,3,*, Senior Member, CCF        

  1. 1 Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2 Department of Computer Science, College of Engineering, Shantou University, Shantou 515063, China;
    3 Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education Zhuhai College of Jilin University, Zhuhai 519041, China
  • Received:2020-07-30 Revised:2021-02-28 Online:2021-03-05 Published:2021-04-01
  • Contact: Ying Li, Yan-Chun Liang;
  • About author:Wei Du received his Ph.D. degree in computer science and technology from Jilin University, Changchun, in 2011. He was a visiting scholar with the University of Georgia, Athens, from 2015 to 2016. He is currently an associate professor in the College of Computer Science and Technology, Jilin University, Changchun. He has published more than 40 journal and conference papers. His major research interests include bioinformatics, computational biology, and computational intelligence.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China under Grant Nos. 61872418, 61972174, and 62002212, the Natural Science Foundation of Jilin Province of China under Grant Nos. 20180101050JC and 20180101331JC, the Science and Technology Planning Project of Guangdong Province of China under Grant No. 2020A0505100018, and the Guangdong Key-Project for Applied Fundamental Research under Grant No. 2018KZDXM076.

The identification of blood-secretory proteins and the detection of protein biomarkers in the blood have an important clinical application value. Existing methods for predicting blood-secretory proteins are mainly based on traditional machine learning algorithms, and heavily rely on annotated protein features. Unlike traditional machine learning algorithms, deep learning algorithms can automatically learn better feature representations from raw data, and are expected to be more promising to predict blood-secretory proteins. We present a novel deep learning model (DeepHBSP) combined with transfer learning by integrating a binary classification network and a ranking network to identify blood-secretory proteins from the amino acid sequence information alone. The loss function of DeepHBSP in the training step is designed to apply descriptive loss and compactness loss to the binary classification network and the ranking network, respectively. The feature extraction subnetwork of DeepHBSP is composed of a multi-lane capsule network. Additionally, transfer learning is used to train a highly accurate generalized model with small samples of blood-secretory proteins. The main contributions of this study are as follows: 1) a novel deep learning architecture by integrating a binary classification network and a ranking network is proposed, superior to existing traditional machine learning algorithms and other state-of-the-art deep learning architectures for biological sequence analysis; 2) the proposed model for blood-secretory protein prediction uses only amino acid sequences, overcoming the heavy dependence of existing methods on annotated protein features; 3) the blood-secretory proteins predicted by our model are statistically significant compared with existing blood-based biomarkers of cancer.

Key words: blood-secretory protein; deep learning; capsule network; transfer learning;

