We use cookies to improve your experience with our site.

CORE:基于扩展共同区的多蛋白结构比对以产生多个解决方案

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

  • 摘要: 在过去的几十年里,生物学家进行大量的研究来检验蛋白质的一般的和特殊的功能.一般来说,如果两个蛋白质的结构或者氨基酸序列存在相似性,那么可预测这两个蛋白质有相同的生物功能.蛋白质功能主要被结构而非氨基酸序列决定.因此,对于研究来说蛋白结构比对算法是一个基础工具.算法的质量依赖于相似度量方法的质量,这里,相似度量方法是一种用于确定最佳比对的目标函数.但是,没有相似度量方法能成为黄金标准,因为它们有各自的优缺点.获得单个比对需要非常多的过滤,在本文章中,我们提出了一种新的策略发现不同长度的多个而非一个比对.这种方法可以得到高质量比对,但是这种方法也带来了新的问题,该方法的运行时间比仅发现一个比对的方法的时间长很多.为了解决这个问题,我们提出了能够定位多个候选比对的共同区(CORE)的算法并将CORE扩展到多比对.因为CORE能被最终比对定义,我们介绍与我们CORE相似的CORE*.通过采用CORE*和DP,我们提出的方法产生比之前的方法更加准确的不同长度的多比对.在实验中,与TM-align获得的比对相比,在超家族层面,我们的方法获得的比对平均长17%,在折叠层面,我们的方法获得的比对平均长15.48%.

     

    Abstract: Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.

     

/

返回文章
返回