Show simple item record

dc.contributor.authorQiu, Debin
dc.date.accessioned2016-11-23T05:30:31Z
dc.date.available2016-11-23T05:30:31Z
dc.date.issued2016-05
dc.identifier.otherqiu_debin_201605_phd
dc.identifier.urihttp://purl.galileo.usg.edu/uga_etd/qiu_debin_201605_phd
dc.identifier.urihttp://hdl.handle.net/10724/36312
dc.description.abstractHigh or ultrahigh dimensional data set with group structure emerge in a wide range of scientific research and applications nowadays. However, sparsity may exist in this high or ultrahigh dimensional data with such group form. In such case, our primary goal is to select the important groups that are significantly correlated with outcome. In particular, grouped variable selection plays a critical role in selecting groups and estimating the nonzero coefficients for these covariates within these important groups. Nevertheless, in the presence of ultra-high dimensional data consisting of grouped variables, many algorithms for grouped variable selection may fail to converge or yield insensible results. Even if the algorithm works, it will suffer from a rather intensive computation load. In this dissertation, we propose a two-stage procedure, grouped variable screening and selection, to solve those challenging issues. At the first stage, grouped variable screening is applied to reduce the dimensionality of data by filtering out the unimportant groups that have no contribution to outcome. A sure screening property is established to ensure an overwhelming probability of retaining all important groups after the screening procedure under suitable conditions. This work will mainly focus on four grouped variable screening criteria. At the second stage, since the data have been reduced from ultra-high dimensionality to the moderate one or even lower than sample size, grouped variable selection methods are able to select the important groups effectively and estimate the nonzero coefficients accurately. Meanwhile, the computation can be decreased dramatically in terms of running time and complexity when executing the grouped variable selection. The performance of the proposed two-stage procedure is evaluated by various simulated examples and a real data set in genetic analysis. An R package called grpss is developed to incorporate the two-stage procedure into real applications.
dc.languageeng
dc.publisheruga
dc.rightsOn Campus Only Until 2018-05-01
dc.subjectgrouped variables
dc.subjectgrouped variable selection
dc.subjectgrouped variable screening
dc.subjectmarginal correlation learning
dc.subjectpenalized regression
dc.subjectrandom permutation
dc.subjectsure screening property
dc.titleGrouped variable screening for ultrahigh dimensional data under linear model
dc.typeDissertation
dc.description.degreePhD
dc.description.departmentStatistics
dc.description.majorStatistics
dc.description.advisorJeongyoun Ahn
dc.description.committeeJeongyoun Ahn
dc.description.committeeLily Wang
dc.description.committeeWilliam McCormick
dc.description.committeePengsheng Ji


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record