|dc.description.abstract||This dissertation mainly focuses on bioenergy studies from the viewpoint of Bioinformatics. The bioenergy in the study is called second-generation biofuel, which is produced from cellulosic biomass. The difference from a first-generation biofuel is that the cellulosic biomass is not from edible plants. My studies can be classified into two fields, plants and bacteria. In the plant field, a better plant model is required in order to produce cellulosic biomass via genomics modifications. The goals of the genomics projects include increasing the composition of plant cell walls and reducing the difficulties of plant cell wall degradation. In the bacteria field, a better bacterium is necessary to produce a higher yield of ethanol. Through genomics studies, I want to understand the mechanism and functional pathway of ethanol biosynthesis to re-engineer the pathway.
My dissertation consists of three bioinformatics projects, which all utilized bioinformatics tools and analyses to reach the ultimate goal of increasing the amount of ethanol. The first project, in the plant field, is the prediction of plant Golgi resident proteins. I identified novel Golgi proteins because most of the enzymes associated with plant cell wall biosynthesis are located at Golgi. A machine-learning based method was used to identify Golgi proteins. With those identified Golgi proteins, other scientists can then possibly focus on studying a reasonable number of enzymes. The second project, in the plant field as well, is to determine a complete set of transcribed sequences in switchgrass. The transcribed sequences have been used to design microarray chips for studying transcriptome expression profiles of switchgrass. I applied two-step de novo assembly on Sanger and 454 sequencing data to achieve the goal of getting transcribed sequences. The last project focuses on constructing transcriptome structure maps of a thermophilic bacterium, Clostridium thermocellum that can degrade plant cell walls and produce ethanol. I used a machine-learning based method together with strand-specific RNA-seq data to identify genome-wide transcription units, which are functional elements of a genome. The transcriptome structure maps will help to understand more about how Clostridium thermocellum synthesizes ethanol.||