An efficient algorithm for aligning sequences to RNA secondary structure
MetadataShow full item record
Stochastic Context-Free grammars (SCFGs) have been used to model RNA secondary structure. Finding an optimal alignment of an RNA sequence to an SCFG structure model can be done via dynamic programming by the CYK algorithm. This is computationally time-consuming since it needs to fill all the cells in a large 3-dimensional matrix. We observe that some of the cells in the matrix are not needed and the calculation of these useless cells should be removed. We develop an improved algorithm through identifying the sizes, outside-offsets, and valid cells for each non-terminal in an SCFG. Since recursion rules may generate substrings of any length, we further introduce two techniques to restrict the number of times to use a recursion rule. Test results show a significant improvement in running time for the new algorithm, which can effectively discriminate RNA sequences that can fold into the modeled secondary structure from those that cannot.