The evolution of gene composition in angiosperms
MetadataShow full item record
Eudicots and monocots are the two major groups of plants in the angiosperms, or flowering plants, and they have distinct morphological features. One of the great biological questions is what genetic changes differentiate these two groups. In this dissertation, we took a first step toward answering this question by investigating gene content evolution in angiosperms. The completeness of the genome sequence is an important factor in determining if a gene is present or absent in a genome. We first used a novel method to estimate the genome size and evaluate the completeness of the Arabidopsis genome. By comparing the components of the samples from the whole genome assembly and independent whole genome shotgun sequencing, we demonstrated that the Arabidopsis genome size is at least 134.4 MB and that the number of genes missing from the current sequence data is likely to be less than 200. Gene number is one of the key parameters of biological complexity. Imperfect annotations have given consistent over-estimation of gene number in large and complex plant genomes. Here we show that GeneTrek is a cost-effective method to delineate the composition and structure of a complex genome. By annotating 74 randomly selected BACs, we demonstrated that maize contains 37,000 to 63,000 genes with uneven distribution along the genome. Finally, we compared the gene content of monocots and eudicots, using the nearly complete rice and Arabidopsis genome sequences as the basis of the analysis. We estimated that at least 2,460 rice genes are missing from Arabidopsis and at least 558 Arabidopsis genes are missing in rice, including some genes with known function. The missing gene set is enriched with genes of unknown functions, providing an interesting set of genes for further functional analysis. By looking at the presence and absence pattern of the lost genes in the phylogenetic tree, some gene loss events can be dated. Our gel blot hybridization experiments confirmed that the incompleteness of the Arabidopsis genome has had little effect on the estimation of rice genes missing in Arabidopsis. Additional plant genome sequences are needed to illustrate the details of gene content evolution in angiosperms.