Decoding the Structural Keywords in Protein Structure Universe

Wessam Elhefnawy; Min Li; Jian-Xin Wang; Yaohang Li

doi:10.1007/s11390-019-1895-y

Wessam Elhefnawy, Min Li, Jian-Xin Wang, Yaohang Li. Decoding the Structural Keywords in Protein Structure Universe[J]. Journal of Computer Science and Technology, 2019, 34(1): 3-15. DOI: 10.1007/s11390-019-1895-y

Citation:

Decoding the Structural Keywords in Protein Structure Universe

Abstract

Abstract

Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools, the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB) recently. In this work, we identify a protein structural dictionary (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords" that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.

FullText(HTML)

References (35)

Relative Articles

Supplements (1)

Cited By

Decoding the Structural Keywords in Protein Structure Universe

Abstract

Catalog

Export File

Citation

Format

Content