• Machine Learning and Data Mining • Previous Articles     Next Articles

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Xiao-Yong Fang, Zhi-Gang Luo, and Zheng-Hua Wang   

  1. School of Computer Science, National University of Defense Technology, Changsha 410073, China
  • Received:2007-07-02 Revised:2008-05-03 Online:2008-07-10 Published:2008-07-10

Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, $M$, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, $\la$ and $\la'$, to perform phylogenic analysis of homologous RNA sequences. Here, $\la$ is for non-structural regions of the sequence and $\la'$ is for structural regions of the sequence; 3) we merge $\la$ and $\la'$ into $M$ to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The {\it sensitivity} and {\it specificity} of our method are more accurate than those of the predictions by Pfold.

Key words: singularity spectrum function; singularity point; truncated spectrum;

[1] Storz G. An expanding universe of noncoding RNAs. {\it Science}, 2002, 296(5571): 1260--1263.
[2]} Eddy S R. Non-coding RNA genes and modern RNA world. {\it Nat. RevGenet}, 2001, 2(12): 919--929.
[3]} Huttenhofer A, Schattner P, Polacek N. Non-coding RNAs: Hope or hype? {\it TRENDS in Genetics}, 2005, 21(5): 289--297.
[4]} Furtig B {\it et al}. NMR spectroscopy of RNA. {\it Chembiochem}, 2003, 4(10): 936--962.
[5]} Gardner P P, Giegerich G. A comprehensive comparison of comparative RNA structure prediction approaches. {\it BMC Bioinformatics}, 2004, 5: 140--157.
[6]} Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. {\it Nucleic Acids Research}, 1981, 9(1): 133--148.
[7]} Hofacker I, Fekete M, Stadler P. Secondary structure prediction for aligned RNA sequences. {\it Journal of Molecular Biology}, 2002, 319(5): 1059--1066.
[8]} Sakakibara Y {\it et al}. Stochastic context-free grammars for tRNA modeling. {\it Nucleic Acids Research}, 1994, 22(23): 5112--5120.
[9]} Eddy S R, Durbin R. RNA sequence analysis using covariance models. {\it Nucleic Acids Research}, 1994, 22(11): 2079--2088.
[10]} Dowell R, Eddy S. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. {\it BMC Bioinformatics}, 2004, 5: 71--84.
[11]} Dowell R, Eddy S. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. {\it BMC Bioinformatics}, 2006, 7: 400--417.
[12]} Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. {\it Bioinformatics}, 1999, 15(6): 446--454.
[13]} Knudsen B, Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. {\it Nucleic Acids Research}, 2003, 31(13): 3423--3428.
[14]} Do C B, Woods D A, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. {\it Bioinformatics}, 2006, 22(14): e90--e98.
[15]} Durbin R {\it et al}. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press, 1998, pp.233--297.
[16]} Pace N R, Thomas B C, Woese C R. Probing RNA Structure, Function, and History by Comparative Analysis. The RNA World, 2nd edition, NY: Cold Spring Harbor Laboratory Press, 1999, pp.113--141.
[17]} Sam G J, Alex B {\it et al}. Rfam: An RNA family database. {\it Nucleic Acids Research}, 2003, 31(1): 439--441.
[18]} Xiaoyong Fang {\it et al}. The detection and assessment of possible RNA secondary structure using multiple sequence alignment. In {\it Proc. the 22nd Annual ACM Symposium on Applied Computing}, Seoul, Korea, March 11--15, 2007, pp.133--137.
[1] LUO Jianhua; ZHUANG Tiange;. Reduction of Artifacts in Images from MR Truncated Data Using Singularity Spectrum Analysis [J]. , 2000, 15(4): 360-367.
[2] LUO Jianhua(骆建华)and ZHUANG Tiange(庄天戈). Reduction of Artifacts in Images from MR Truncated Data Using Singularity Spectrum Analysis [J]. , 2000, 15(4): 0-0.
Full text



[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved