Show simple item record

dc.contributor.authorSafo, Sandra Esi
dc.description.abstractAdvancement in technology and computing power have led to the generation of data with enormous amount of variables when compared to the number of observations. These types of data, also known as high dimension, low sample size, are plagued with different challenges that either require modifications of existing traditional methods or development of new statistical methods. One of these challenges is the development of Sparse methods that use only a fraction of the variables. Sparse methods have been shown to perform better at making predictions on real high dimensional problems, hence justifying their studies and use in practice. This dissertation considers three novel methods for designing and analyzing high dimensional studies. We first propose new sample size method to estimate the number of samples required in a training set when allocating new entities into two groups. The methodology exploits the structural similarity between logistic regression prediction and errors-in-variables models. Secondly, we consider the problem of assigning future observations to known classes using linear discriminant analysis. We propose a new classification approach of generalizing existing binary linear discriminant methods to multiclass methods. Our methodology utilizes the equivalence between discriminant subspace using Fisher's linear discriminant analysis and basis vectors of between class scatter. We apply the proposed method to two sparse methods. Thirdly, a general framework that results in sparse vectors for many multivariate statistical methods is developed. The framework uses the relationship between many multivariate statistical problems and generalized eigenvalue problem. We illustrate this framework with two multivariate statistical methods- linear discriminant analysis for classifying new entities into more than two groups, and canonical correlation analysis for studying associations between two different high dimensional data types. The effectiveness of the proposed methods in this dissertation is evaluated by various simulated processes and real data analyses on microarray and RNA sequencing (RNA-seq) data.
dc.subjectHigh dimensional data
dc.subjectSample size
dc.subjectRegularized logistic regression
dc.subjectConditional score
dc.subjectMeasurement error
dc.subjectLinear discriminant analysis
dc.subjectMulti-class discrimination
dc.subjectSingular value decomposition
dc.subjectSparse discrimination
dc.subjectGeneralized eigenvalue problem
dc.subjectSparse canonical correlation analysis
dc.titleDesign and analysis issues in high dimension, low sample size problems
dc.description.advisorKevin K Dobbin
dc.description.advisorJeongyoun Ahn
dc.description.committeeKevin K Dobbin
dc.description.committeeJeongyoun Ahn
dc.description.committeeXiao Song
dc.description.committeeJaxk Reeves
dc.description.committeeNicole Lazar

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record