›› 2013, Vol. 28 ›› Issue (4): 647-656.doi: 10.1007/s11390-013-1365-x

Special Issue: Artificial Intelligence and Pattern Recognition

• Special Section of EDB2012 • Previous Articles     Next Articles

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

Woo-Cheol Kim1, Sanghyun Park2, and Jung-Im Won3   

  1. 1. College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, U.S.A.;
    2. Department of Computer Science, Yonsei University, Seoul 120-749, Korea;
    3. Research Center of Information and Electronic Engineering, Hallym University, Chuncheon, Gangwon 200-702, Korea
  • Received:2012-09-08 Revised:2013-05-10 Online:2013-07-05 Published:2013-07-05
  • Supported by:

    This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea under Grant No.2012R1A1A3013084.

Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.

[1] Ginalski K, Grishin N V, Godzik A, Rychlewski L. Practi-cal lessons from protein structure prediction. Nucleic AcidsResearch, 2005, 33(6): 1874-1891.

[2] Roytberg M, Gambin A, Noe L et al. On subset seeds for pro-tein alignment. IEEE/ACM Trans. Computational Biologyand Bioinformatics, 2009, 6(3): 483-494.

[3] Mayr G, Domingues F, Lackner P. Comparative analysis ofprotein structure alignments. BMC Structural Biology, 2007,7: Article No.50.

[4] Zhang Y. Protein structure prediction: When is it usefulCurrent Opinion in Structural Biology, 2009, 19(2): 145-155.

[5] Holm L, Sander C. Protein structure comparison by align-ment of distance matrices. Journal of Molecular Biology,1993, 233(1): 123-138.

[6] Dahiyat B I, Mayo S L. De novo protein design: Fully auto-mated sequence selection. Science, 1997, 278(5335): 82-87.

[7] Yakunin A F, Yee A A, Savchenko A, Edwards A M, Arrow-smith C H. Structural proteomics: A tool for genome anno-tation. Current Opinion on Chemical Biology, 2004, 8(1):42-48.

[8] Menke M, Berger B, Cowen L. Matt: Local flexibility aidsprotein multiple structure alignment. PLoS ComputationalBiology, 2008, 4(1): e10.

[9] Gu J, Bourne P. Structural Bioinformatics (2nd edition).John Wiley, 2009.

[10] Arun K S, Huang T S, Blostein S D. Least-squares fittingof two 3-D point sets. IEEE Trans. Pattern Analysis andMachine Intelligence, 1987, 9(5): 698-700.

[11] Sippl M J, Wiederstein M. A note on difficult structure align-ment problems. Bioinformatics, 2008, 24(3): 426-427.

[12] Chen L, Zhou T, Tang Y. Protein structure alignment by de-terministic annealing. Bioinformatics, 2005, 21: 51-62.

[13] Glasgow J, Kuo T, Davies J. Protein structure from contactmaps: A case-based reasoning approach. Information Sys-tems Frontiers, 2006, 8(1): 29-36.

[14] Bhattacharya S, Bhattacharyya C, Chandra N R. Comparisonof protein structures by growing neighborhood alignments.BMC Bioinformatics, 2007, 8: Article No.77.

[15] Kolbeck B, May P, Schmidt-Goenner T, Steinke T, Knapp EW. Connectivity independent protein-structure alignment: Ahierarchical approach. BMC Bioinformatics, 2006, 7: ArticleNo.510.

[16] Eidhammer I, Jonassen I, Taypor W. Structure comparisonand structure patterns. Journal of Computational Biology,2000, 7(5): 685-716.

[17] Shindyalov I N, Bourne P E. Protein structure alignmentby incremental combinatorial extension (CE) of the optimalpath. Protein Engineering, 1998, 11(9): 739-747.

[18] Taylor W R, Orengo C A. Protein structure alignment. Jour-nal of Molecular Biology, 1989, 208(1): 1-22.

[19] TaylorWR. Protein structure comparison using iterated dou-ble dynamic programming. Protein Science, 1999, 8(3): 654-665.

[20] Jewett A I, Huang C C, Ferrin T E. MINRMS: An efficientalgorithm for determining protein structure similarity us-ing root-mean-squared-distance. Bioinformatics, 2003, 19(5):625-634.

[21] Lotan I, Schwarzer F. Approximation of protein structure forfast similarity measures. Journal of Computational Biology,2004, 11(2/3): 299-317.

[22] Gibrat J F, Madej T, Bryant S H. Surprising similarities instructure comparison. Current Opinion in Structural Biology,1996, 6(3): 377-385.

[23] Kabsch W, Sander C. Dictionary of protein secondary struc-ture: Pattern recognition of hydrogen-bonded and geometri-cal features. Biopolymers, 1983, 22(12): 2577-2637.

[24] Frishman D, Argos P. Knowledge-based protein secondarystructure assignment. Proteins-Structure Function and Ge-netics, 1995, 23(4): 566-579.

[25] Holm L, Sander C. 3-D lookup: Fast protein structuredatabase searches at 90% reliability. In Proc. the 3rd Int.Conference on Intelligent Systems for Molecular Biology,July 1995, Vol.3, pp.179-187.

[26] Nussinov R, Wolfson H J. Efficient detection of three-dimensional structural motifs in biological macromolecules bycomputer vision techniques. Proc. National Academy of Sci-ences of USA, 1991, 88(23): 10495-10499.

[27] Le Q, Pollastri G, Koehl P. Structural alphabets for pro-tein structure classification: A comparison study. Journalof Molecular Biology, 2009, 387(2): 431-450.

[28] Erdmann M A. Protein similarity from knot theory: Geomet-ric convolution and line weavings. Journal of ComputationalBiology, 2005, 12(6): 609-637.

[29] Zhang Y, Skolnick J. TM-align: A protein structure alignmentalgorithm based on the TM-score. Nucleic Acids Research,2005, 33(7): 2302-2309.

[30] Zhang Y, Skolnick J. Scoring function for automated assess-ment of protein structure template quality. Proteins, 2004,57(4): 702-710.

[31] Godzik A. The structural alignment between two proteins: Isthere a unique answer Protein Science, 1996, 5(7): 1325-1338.

[32] Murzin A G, Brenner S E, Hubbard T, Chothia C. SCOP:A structural classification of proteins database for the inves-tigation of sequences and structures. Journal of MolecularBiology, 1995, 247(4): 536-540.

[33] Berman H M, Westbrook J, Feng Z et al. The protein databank. Nucleic Acids Research, 2000, 28(1): 235-242.
No related articles found!
Full text



No Suggested Reading articles found!

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved