Show simple item record

dc.contributor.authorRobbins, Kelly R.
dc.date.accessioned2014-03-04T02:52:14Z
dc.date.available2014-03-04T02:52:14Z
dc.date.issued2007-12
dc.identifier.otherrobbins_kelly_200712_phd
dc.identifier.urihttp://purl.galileo.usg.edu/uga_etd/robbins_kelly_200712_phd
dc.identifier.urihttp://hdl.handle.net/10724/24479
dc.description.abstractThe use of genomic technology has the potential to provide invaluable insight into the mechanisms of several important traits. Unfortunately this information comes at a cost, in terms of the high-dimensions and sometimes poor quality of the data. One potential application of genomics is the diagnosis of diseases, such as Alzheimer’s disease, with ambiguous and confounding clinical markers. Of course to predict disease statuses, an algorithm must first be trained using a data set in which disease statuses are known without error. In the case of incipient Alzheimer’s disease this is rarely the case. To this end a misclassification algorithm was applied to a data set containing healthy individuals and incipient Alzheimer’s patients to examine the effects of potential misclassification on diagnostic accuracy. Results obtained without invoking the misclassification algorithm showed limited predictive power of the model. When the misclassification algorithm was invoked significant increase in the model’s predictive ability were obtained. These results demonstrate the utility of the misclassification algorithm in data sets containing potential misdiagnosis. In addition to potential misdiagnosis, the high-dimensions of genomic data sets can also pose substantial issues for statistical analysis. Due to the large number of features in many genomic datasets, explicit modeling of gene interactions is often infeasible. To eliminate the need for simplifying assumptions a machine learning algorithm, referred to as the ant colony algorithm (ACA), was adapted for analysis of high-dimension genomic data. In a study examining the selection of predictive gene expression patterns, the performance of the ACA was compared to several standard methodologies. When applied to high-dimensional data sets, the ACA was able to identify small subsets of highly predictive genes, yielding superior prediction accuracy when compared to several standard feature selection methods. In an application involving single nucleotide polymorphism marker data, a modified ACA was implemented to identify markers associated with a binary trait under the influence of interacting loci. When compared to marginal effects models, the ACA demonstrated superior performance under several simulation scenarios with p-values for associated SNP being more significant using the ACA, resulting in substantial increases in power.
dc.languageeng
dc.publisheruga
dc.rightspublic
dc.subjectant colony optimization
dc.subjectgenomics
dc.subjectlatent variable model
dc.subjectlogistic regression
dc.subjectmisclassification algorithm
dc.titleStatistical methods for the analysis of complex genomic data
dc.typeDissertation
dc.description.degreePhD
dc.description.departmentAnimal and Dairy Science
dc.description.majorAnimal Science
dc.description.advisorJ. Keith Bertrand
dc.description.advisorRomdhane Rekaya
dc.description.committeeJ. Keith Bertrand
dc.description.committeeRomdhane Rekaya
dc.description.committeeSamuel Aggrey
dc.description.committeeIgnacy Misztal


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record