Variable selection in longitudinal data with application to education
MetadataShow full item record
Variable selection with a large number of predictors is a very challenging and important problem in multiple linear regression. However, relatively little attention has been paid to issues of variable selection in longitudinal data with application to education. This study examines data in which reading achievement of TOEIC measured for each quarter in a year is a response variables and other predictors such as gender, socioeconomic status (SES), and majors are used as predictors. Using this longitudinal educational data, we compare multiple regression, backward elimination, group least selection absolute shrinkage and selection operator (LASSO), and linear mixed models in terms of their performance in variable selection. In our case study, the results show that four di erent statistical methods contain di erent sets of predictors in their models. The linear mixed model (LMM) provides the smallest number of predictors (4 predictors among a total of 19 predictors). In addition, LMM is the only appropriate method for the repeated measurement and is the best method with respect to the principal of parsimony. We also provide interpretation of the selected model by LMM in the conclusion.