A probabilistic model for gene family evolution
MetadataShow full item record
Studies in gene family evolution have revealed invaluable information about the evolutionary relationships among genes in a gene family and the underlying gene retention mechanisms that help shape the gene family across species. A gene family is formed by gene duplication and loss events during the evolutionary history of species. More importantly, gene duplication is the major source of novelties (i.e. raw materials) on which evolutionary forces may have acted. However, the probabilistic models of gene duplication and loss in the context of phylogenetic trees are still limited in the current literature, wherein no model has taken into account the effect of gene retention mechanisms such as neofunctionalization and subfunctionalization. Thus, it is essential to build a probabilistic frame work to understand gene family evolution. In this dissertation, we are focusing on building a Bayesian hierarchical model for gene family evolution, in which different gene retention mechanisms are incorporated through the nonhomogeneous birth and death process of gene copies. We first develop a birth-death age model for gene family evolution in a single population, in which the loss rates of duplicated genes are functions of the ages of genes. From the birth-death age model, we have derived the probability density function of a gene family tree given the species tree. The probability distribution can be used to estimate model parameters and to simulate gene family data. Moreover, we extend the age-dependent birth and death model to multiple populations in the context of phylogenetic trees, where the joint probability density function of duplication times and number of gene copies at the internal nodes are given. Finally, we propose a Bayesian hierarchical model for gene family evolution, which involves two stochastic processes -mutation process of DNA sequences and the birth and death process of genes.