Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing
Samaniego Castruita, Jose A
Zepeda Mendoza, Marie L
Gilbert, M T P
MetadataShow full item record
Abstract Background Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets. Results We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly. Conclusion Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.