An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences
-
Abstract
Several experiments and observations have revealed the fact that smalllocal distinct structural features in RNA molecules are correlated withtheir biological function, for example, in post-transcriptionalregulation of gene expression. Thus, finding similar structural featuresin a set of RNA sequences known to play the same biological function couldprovide substantial information concerning which parts of the sequencesare responsible for the function itself. Unfortunately, finding commonstructural elements in RNA molecules is a very challenging task, even iflimited to secondary structure. The main difficulty lies in the factthat in nearly all the cases the structure of the molecules is unknown,has to be somehow predicted, and that sequences with little or nosimilarity can fold into similar structures. Although they differ insome details, the approaches proposed so far are usually based on thepreliminary alignment of the sequences and attempt to predict commonstructures (either local or global, or for some selected regions) forthe aligned sequences. These methods give good results when sequence andstructure similarity are very high, but function less well whensimilarity is limited to small and local elements, like singlestem-loop motifs. Instead of aligning the sequences, the algorithm wepresent directly searches for regions of the sequences that can foldinto similar structures, where the degree of similarity can be definedby the user. Any information concerning sequence similarity in themotifs can be used either as a search constraint, or a posteriori, bypost-processing the output. The search for the regions sharingstructural similarity is implemented with the affix tree, a noveltext-indexing structure that significantly accelerates the search forpatterns having a symmetric layout, such as those forming stem-loopstructures. Tests based on experimentally known structures have shownthat the algorithm is able to identify functional motifs in thesecondary structure of non coding RNA, such as Iron Responsive Elements(IRE) in the untranslated regions of ferritin mRNA, and the domain IVstem-loop structure in SRP RNA.
-
-