Evolving Stochastic Context-Free Grammars for RNA Secondary Structure

James W. J. Anderson, Joe Staines, Paula Tataru, Jotun Hein, Rune Lyngsø


Abstract:

RNA secondary structure prediction has benefited much from the use of Stochastic Context-Free Grammars (SCFGs) in the early 90s and the combination of these with comparative methods in the late 90s. The set of SCFGs useful for RNA secondary structure prediction is very large, but a few intuitively designed grammars have nonetheless been completely dominant. Here we investigate two automatic search techniques: exhaustive search for very compact grammars and an evolutionary algorithm for larger grammars. Furthermore we look at whether grammar ambiguity is as problematic to structure prediction as has been previously suggested. These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, but the best of the classic grammars was hard to beat significantly. In general results revealed that many grammars with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars.