With various whole genome sequences becoming available, together with high throughput data concerning the molecular biology of the cell, it is becoming possible to gain extensive insight into the basic biology and biochemistry of a wide range of organisms through the identification of protein coding genes. However, it is rapidly becoming clear that many genes do not code for proteins, but rather RNAs. Indeed, given recent biochemical work describing large numbers of completely novel RNAs, including two families of snoRNAs, tmRNA, microRNAs, small interfering RNAs (siRNAs), and RNA-dependent editing mechanisms, it is likely that there are many more RNAs carrying out a broad range of functions in the cell than was previously thought. Thus a comprehensive understanding of the biology of a cell will ultimately require a knowledge of the identity of all encoded RNAs, the molecules with which they interact, and the molecular structures of these complexes.
For these reasons, the computational biology of RNA is playing an increasingly important role within functional genomics. Here at UEA we develop mathematical and computational tools for the identification of RNA genes and structural elements within genomes, the prediction of RNA structure using evolutionary and physical principles, and the analysis of RNA structure and its application to topical problems in molecular and cell biology.
An important feature of RNA evolution is that structure tends to be conserved rather than primary sequence, and indeed, we have shown in a search for H/AHA snoRNAs that secondary structure can contain phylogenetic information. However, a major difficulty is that, compared with proteins, there are fewer signals for RNAs in genome sequences; RNA genes are not specified by open reading frames as they are not translated, and, moreover, sequence conservation between members of the same family are too low for standard genome search strategies such as BLAST to detect. Therefore, the search for RNA genes and structures presents special computational problems since it is absolutely essential to take into account secondary (and ultimately tertiary) structure. Together with Dr. Tamas Dalmay , we develop new techniques for identifying new short RNAs in sequence data, such as microRNAs.
The identification of RNA genes and structures requires a thorough understanding of known RNA families. Hence it is necessary to develop new tools for the extraction of common structural features within given families of RNAs. The appearance of several data bases containing RNA structures increases the desire for developing such tools. In recent work we have investigated the use of mutual information to predict RNA structure from a multiple sequence alignment. This has resulted in a new freely available tool for structure prediction called MIfold.
To derive relevant biological information from RNA structures it is necessary to have useful analytical tools. We have developed metrics for quantifying the differences between suboptimal structures. These metrics can be computed efficiently and we showed that they could be used to predict conformal switching for RNA structure. We have also incorporated phylogenetic tools into secondary structure analysis to study the evolution of various RNAs. We continue to develop tools for analysing RNA structure. Using new RNA data bases, we are also investigating structural properties such as how well-defined foldings are for naturally occuring RNAs.
We have various international collaborators in the computational biology of RNA, including Eva Freyhult, Uppsala University, Sweden, and Paul Gardner, University of Copenhagen, Denmark.