Efficient discovery of rare alleles and de novo mutations from pre-existing genomic data
Arthur, Robert Adam
MetadataShow full item record
A few methods exist to identify the full spectrum of recent mutations in specific lineages, but all are costly, laborious and slow. We propose a novel strategy that requires only resequencing data and a reference genome sequence that are available at no cost from public databases. The comparison of differences between resequencing shotgun data and overlapping 50mers created in silico to represent the complete reference genome allows the discovery of reference-genome-specific de novo mutations, rare alleles, and sequencing errors unique to the reference genome. We investigated Nipponbare rice, and discovered thousands of candidate de novo sequence changes, of which ~51% are calculated to be events that occurred during the recent descent of this lineage. The remaining 49% were Nipponbare reference genome sequencing errors. Of the 148 validated mutations specific to Nipponbare, we found 143 single nucleotide substitutions, 4 tiny insertions, and 1 tiny deletion. Additionally, we applied our method to the reference genome for foxtail millet, Yugu1. However, the resequencing data for this species was not sufficient to mask ancient standing variation in the progenitors of Yugu1, so the analysis primarily yielded rare alleles and sequencing errors rather than de novo mutations. Of 119 confirmed sequence variations unique to Yugu1, we found 66 transitions, 40 transversions, and 13 indels (9 insertions, 4 deletions), all of which were only 1 bp. Surprisingly, despite very high sensitivity to this type of genome change, we did not detect any recent transposable element activity in the origins of Nipponbare or Yugu1.