We use cookies to improve your experience with our site.
Badal K, Qingge L, Liu X et al. Novel probabilistic and machine learning approaches for the protein scaffold gap filling problem. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY. DOI: 10.1007/s11390-025-4973-3
Citation: Badal K, Qingge L, Liu X et al. Novel probabilistic and machine learning approaches for the protein scaffold gap filling problem. JOURNAL OFCOMPUTER SCIENCE AND TECHNOLOGY. DOI: 10.1007/s11390-025-4973-3

Novel Probabilistic and Machine Learning Approaches for the Protein Scaffold Gap Filling Problem

  • In de novo protein sequencing, we often could only obtain an incomplete protein sequence, namely a scaffold, from top-down and bottom-up tandem mass spectrometry. While most sections of proteins can be inferred from their homologous sequences, some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffolds. Thus, we only focus on predicting the gaps based on a probabilistic algorithm and a machine learning model instead predicting the complete protein sequence using generative AI models in this paper. We study two versions of the protein scaffold filling problem with known gap size and known gap mass, respectively. For the known size gaps version, we develop several machine learning models based on random forest, k-nearest neighbors, decision tree, and fully connected neural network. For the known gap mass problem, we design a probabilistic algorithm to predict the missing amino acids in the gaps. The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100% and close to 100% accuracy, respectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return