Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Xiao-Yong Fang, Zhi-Gang Luo, and Zheng-Hua Wang   

  1. School of Computer Science, National University of Defense Technology, Changsha 410073, China
  • Received:2007-07-02 Revised:2008-05-03 Online:2008-07-10 Published:2008-07-10

Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, $M$, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, $\la$ and $\la'$, to perform phylogenic analysis of homologous RNA sequences. Here, $\la$ is for non-structural regions of the sequence and $\la'$ is for structural regions of the sequence; 3) we merge $\la$ and $\la'$ into $M$ to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The {\it sensitivity} and {\it specificity} of our method are more accurate than those of the predictions by Pfold.

