Multivariate association and dimension reduction
Iaci, Ross J.
MetadataShow full item record
In this thesis, two different nonparametric methods are developed in the statistical field of multivariate association and dimension reduction.While the underlying goal in both methods is to detect both linear and nonlinear relationships between multiple sets and groups of multivariate random vectors, different uses in statistical applications motivate the methods. The primary goal of the information theory based method of Chapter 2 is to provide an overall measure of association between sets of random vectors. In Chapter 3, a method focusing on dimension reduction is developed to extend Canonical Correlation Analysis(CCA), pioneered by Hotelling , to identify nonlinear relationships. Motivated by a problem in morphological integration studies, a field in biological science, a new general index based on Kullback-Leibler(KL) information is proposed to measure the relationships between multiple sets of random vectors. The relationships are detected using a measure of the dependence between multiple sets by calculating the difference between the joint and marginal densities of affine matrix transformations of the random vectors. From this index, we define an overall measure of dependence between multiple sets, initially motivated by a problem in morphometrics. In addition, we develop two methods for dimension reduction for m-sets of random vectors and then extend these to multiple groups of multiple sets. The second index recovers relationships between sets using a composite L2 distance measure between linear combinations of one vector and an unknown single index model regression function of the other, interchanging the roles of each respectively. Estimates of the regression functions are calculated using the nonparametric Nadaraya and Watson   smoother, thus enabling our index to detect both linear and nonlinear relationships. This method is then extended to identify associations between multiple sets and multiple groups of random vectors. In addition to detecting the nature of the relationships, a bootstrap procedure inspired by Ye and Weiss  is developed to determine the number of significant associations. Moreover, this procedure is independent of the measure used to detect the relationships. Canonical Correlation Analysis is a common measure of the pair-wise linear association between two sets of random vectors and is often used as a benchmark for comparison. In contrast to CCA, both of our methods are shown to determine the existence of both linear and nonlinear relationships, thereby making them useful in many statistical applications.