We use cookies to improve your experience with our site.
Wessam Elhefnawy, Min Li, Jian-Xin Wang, Yaohang Li. Decoding the Structural Keywords in Protein Structure Universe[J]. Journal of Computer Science and Technology, 2019, 34(1): 3-15. DOI: 10.1007/s11390-019-1895-y
Citation: Wessam Elhefnawy, Min Li, Jian-Xin Wang, Yaohang Li. Decoding the Structural Keywords in Protein Structure Universe[J]. Journal of Computer Science and Technology, 2019, 34(1): 3-15. DOI: 10.1007/s11390-019-1895-y

Decoding the Structural Keywords in Protein Structure Universe

  • Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools, the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB) recently. In this work, we identify a protein structural dictionary (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords" that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return