Phylogenomic analysis of gene families
Bagal, Ujwal Ranjit
MetadataShow full item record
The field of high throughput sequencing has advanced at a tremendous pace in the last few years. This has opened up opportunities to understand more about the functional genomics of non-model species for which genome sequences are not yet available. In such situations, transcriptome sequence analysis using comparative methods has facilitated gene discovery and gene expression studies, as well as new understanding of the functional responsibilities of gene family members, the effect of gene duplications, how Darwinian selection affects genome complexities. These approaches have opened unprecedented opportunities to understand functional compositions, as well as overrepresentation / underrepresentation of mRNAs involved in specific biological functions. This dissertation, applies computational approaches and experimental verification to the reevaluation of an earlier report of a single phenylalanine ammonia lyase (PAL) gene in Pinus taeda. This work is followed by a biological analysis of the PAL gene family members in gymnosperms with an eye toward determining their individual evolutionary trajectories and functional variability. The five P. taeda PAL genes revealed diverse evolutionary path for gymnosperms compared to angiosperms starting from a series of ancient gene duplication events. This hypothesis was further supported by identification of codon sites under relaxed evolutionary constraints in lineages associated with duplication events. While gene expression analyses proved insufficient to identify physiological functions of individual genes, it highlighted tissue-specific expression and provided some insight into functional associations of individual PAL genes with biotic / abiotic stress conditions. A relative efficiency analysis of the statistical models used to infer changes in the mode of selection acting on protein coding genes was performed using simulated data sets. Despite the advantage of having more realistic models, the likelihood ratio test (LRT) in Fitmodel was unable to detect shifts in selection pressure. Similarly, the Bayesian approach used to detect individual sites under adaptive selection yielded both high false-negative and false-positive rates. The findings from P. taeda PAL gene family analysis will be useful for future pine tree improvement programs, while, the simulation based studies is expected to provide cautionary advice to researchers about the unreliability of the inferences estimated by the evolutionary tools.