We use cookies to improve your experience with our site.
Kushal Badal, Letu Qingge, Xiaowen Liu, Binhai Zhu. Novel Probabilistic and Machine Learning Approaches for the Protein Scaffold Gap Filling Problem[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-4973-3
Citation: Kushal Badal, Letu Qingge, Xiaowen Liu, Binhai Zhu. Novel Probabilistic and Machine Learning Approaches for the Protein Scaffold Gap Filling Problem[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-4973-3

Novel Probabilistic and Machine Learning Approaches for the Protein Scaffold Gap Filling Problem

  • In de novo protein sequencing, we often could only obtain an incomplete protein sequence, namely scaffold, from top-down and bottom-up tandem mass spectrometry.  While most sections of the proteins can be inferred from its homologous sequences, some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffold. Thus, we only focus on predicting the gaps based on a probabilistic algorithm and machine learning model instead predicting the complete protein sequence using generative AI models in this paper. We study two versions of the protein scaffold filling problem with known size gaps and known mass gaps. For the known size gaps version, we develop several machine learning models based on random forest, k-nearest neighbors, decision tree and fully connected neural network. For the known mass gap problem, we design a probabilistic algorithm to predict the missing amino acids in the gaps. The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100% and close to 100% accuracy.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return