Accurate prediction of human mirna targets via graph modeling and machine learning approaches
MetadataShow full item record
miRNAs are small endogenous non-coding RNA molecules that have a critical function in suppressing genes and they also correlate with many diseases and cancers. Due to the importance of their effects in several cell activities, discovering their mechanisms is an important task. Because the functionality of miRNAs tightly connected to the way they recognize their targets miRNA target prediction has received a lot of attentions in research. Despite that, most of current methods suffer from high false positive rates and they are not able to provide much insight to the actual process of miRNA targeting. In this dissertation, we present two novel approaches aimed at addressing existing issues in miRNA target prediction; one approach to improve false positive rate and the other to substantiate multiple hypotheses pertaining to biological mechanism of miRNA targeting and to provide insight into the actual mechanism. To address the first issue, we present Correlation Graph model that captures nucleotide correlations between miRNA sequence and the target. This model makes it possible to characterize nucleotide correlations other than Watson-Crick base pairings between two parts of the duplex. We designed an SVM based algorithm and tested our model on human data and it achieved a sensitivity of 86% with a false positive rate below 13% which is a significant performance improvement in comparison to the state-of-the-art methods miRanda and RNAhybrid. The second part of this dissertation addresses the issue of understanding the mechanism of miRNA targeting. It contains a multi-hypothesis learner algorithm that utilizes features collected from literature pertaining to the mechanisms of targeting. These features enable the algorithm to partition data in a way very relevant to the biological features. The algorithm uses these partitions to learn multiple hypotheses. Our evaluations on human and mouse datasets show our method has comparable performance to that of high performance classifiers such as RandomForest. Moreover, feature selection on the resulting partitions confirms that the partitioning mechanism is compatible with biological mechanisms. These partitions could be used for further in vivo experiments to verify the currently proposed targeting approaches and to discover the new mechanisms.