Principal component analysis for interval-valued and histogram-valued data and likelihood functions and some maximum likelihood estimators for symbolic data
Le-Rademacher, Jennifer Giaothuy
MetadataShow full item record
Unlike a classical random variable which takes a single value, a symbolic random variable takes multiple values. The values within a symbolic random variable form an internal distribution that does not exist in a classical random variable. Statistical methodologies developed for classical data can not be readily applied to symbolic data. Therefore, new methodologies must be developed to take into account the internal structure of symbolic variables. In this dissertation, we propose three methods of symbolic data analysis. The first proposed method extends classical principal component analysis (PCA) to an analysis of interval-valued observations, using a so-called symbolic variance-covariance structure. Using the symbolic covariance structure ensures that the principal components explain the total variance of interval-valued data. Furthermore, two representations of the principal components resulting from the proposed PCA method are introduced. The first representation shows interval-valued observations as polytopes in a principal components space. The polytopes constructed in this method represent the true structure of interval-valued observations in a principal components space. The second representation gives histogram-valued principal components constructed from a 2-dimensional projection of the polytopes resulting from the first representation. Algorithms to construct the polytopes and to compute the histograms representing the principal components are given in this dissertation along with two examples to illustrate the method. The second method extends the PCA method proposed for interval-valued data to a PCA method for histogram-valued data. This method treats histogram-valued observations as a generalization of interval-valued observations. The two representations proposed for interval-valued observations are then extended to represent histogram-valued observations. An algorithm for the extension along with an example to illustrate this method are included. The third method proposes a construction of likelihood functions for symbolic data. The proposed likelihood function is then used to derive maximum likelihood estimators for the mean and the variance of three common types of symbolic data: interval-valued data, histogram-valued data, and triangular-distribution-valued data.