High/ultra-high dimensional single-index models
MetadataShow full item record
Single-index models are useful and fundamental tools for handling "curse of dimensionality" problems in nonparametric regression. In addition to that, variable selection also plays an important role in such model building processes when the index vectors are high-dimensional. Several procedures have been developed for estimation and variable selection for single-index models when the number of index parameters is fixed. In many high-dimensional model selection problems, the number of parameters is increasing along with the sample size. In the first part of this work, we consider weakly dependent data and propose a class of variable selection procedures for single-index prediction models. We apply polynomial spline basis function expansion and smoothly clipped absolute deviation penalty to perform estimation and variable selection in the framework of a diverging number of index parameters. Under stationary and strong mixing conditions, the proposed variable selection method is shown to have the "oracle" property when the number of index parameters tends to infinity as the sample size increases. A fast and efficient iterative algorithm is developed to simultaneously estimate parameters and select significant variables. The finite sample behavior of the proposed method is evaluated with simulation studies and illustrated by some river flow data from Iceland. Most recently, among numerous modern problems in multiple scientific fields, a noteworthy characteristic feature is that the dimension of the explanatory variable, p, is large, and potentially much larger than the sample size, n. For those problems of large scale or dimensionality, variable selection again plays an important role in the modeling process. Under the sparsity assumption, a variable screening procedure was proposed by Fan and Lv (2008) to reduce the ultra-high dimensionality to a moderate level. However, for practical data analysis, without any prior knowledge, both the true model and the marginal regression can be highly non-linear. To address the above issues in the second part of this work, we investigate ultra-high dimensional penalized single-index models. We further extend the sure independence screening method into a nonparametric independence screening procedure. In addition, a data-driven thresholding determination procedure is proposed to enhance the finite sample performance. New theoretical results are also derived for oracle parameters. Both the numerical results and the real data application demonstrate that the proposed procedure works very well, even for moderate sample size and large dimensionality.