|dc.description.abstract||Finite mixture models provide a concrete way to capture unobserved heterogeneity and thereby create clusters of data. In a regression setting, unobserved heterogeneity occurs when responses come from more than one subpopulation or when extreme values are present in a data. Finite mixture of regression models are widely applied in many diverse areas such as economics, genetics, medicine and psychology, among others. With the advent of modern computers and softwares, the method of maximum likelihood became the most widely applied estimation technique in finite mixture regression models. Even so, it is well known that maximum likelihood estimation for mixture models has many drawbacks, including their sensitivity to extreme values and data contamination.
In this dissertation, we introduce a robust estimation method for finite mixture of regression models based on a minimum integrated L2 distance for count response. It is shown that our robust estimator, called L2E, is consistent and asymptotically normal. Furthermore, our L2E estimator of the number of mixture components, the so called mixture complexity, is also shown to be consistent. Through Monte Carlo simulations, we compare the performance of our L2E with those of the maximum likelihood (ML) estimator and the minimum Hellinger distance (MHD) estimator. It is shown that our L2E is highly competitive to the ML estimator and a better alternative to the MHD estimator.
The L2E estimator is also shown to be more robust than the MHD estimator, when samples are generated from gross-error contaminated (three-component) mixture of Poisson regression models. For the poorly-separated model with the small fraction of extreme values, we propose a modified L2E criterion, called L2Ep, with a penalty function. The L2Ep estimator is also shown to be consistent and asymptotically normal for the case of two-component mixture regression models. In addition, the L2Ep estimator of mixture complexity is also shown to be consistent. Finally, the performance of these two methods are illustrated for two real data sets.
Next, we introduce a modified L2E estimator of mixture complexity in finite mixture models for continuous data. We show via simulations that the modified L2E estimator of mixture complexity helps detect a small fraction of extreme values and improve the efficiency when the true finite mixture model contains a small mixing proportion and are poorly-separated. We also consider non-normal mixture models and study the performance of our L2E estimator in gamma and lognormal mixtures via Monte Carlo simulations and real data analyses.||