Robust estimation in mixture models and small area estimation using cross-sectional time series models
MetadataShow full item record
This dissertation considers robust estimation of unknown number of components, also known as the mixture complexity, in ¯nite mixture models and cross-sectional time series modeling of civilian unemployment rate for all the states in the U.S.. We begin with the problem of ¯nding the mixture with fewest possible components that provides a satisfactory ¯t of the data. Finite mixture models provide a natural way of modeling unobserved population heterogeneity, which is often encountered in data sets arising from biological, physical and social sciences. However, in many applications, it is unrealistic to expect that the component densities belong to some exact parametric family. The mixture of interest may even be contaminated, which causes the estimates such as based on KL distances to be unstable. To overcome this problem, we develop a robust estimator of mixture complexity based on the Minimum Hellinger Distance (MHD) when all other associated parameters are unknown. This estimator is considered in two cases, that is, when the random variables are continuous and discrete. For each case, an estimator of mixture complexity of mixture complexity is constructed as a by-product of minimizing a Hellinger Information Criterion, and this estimator is proved to be consistent for parametric family of mixtures. Via extensive simulations, our estimator is shown to be very competitive with several others in the literature when the model is correctly speci¯ed and to be robust under symmetric departures from postulated component normality in terms of correctly identifying the true mixture complexity robustness. Next, we consider the problem of modeling civilian unemployment rate for all the states in the U.S. Unemployment rate estimates are published by the U.S. Bureau of the Labor Statistics (BLS) every month for the whole nation, 50 states and DC as well as other areas. In recent years, the demand for small area statistics has greatly increased. At the national level, The overall sample size for the Current Population Survey (CPS) is su±cient to produce reliable estimates of UE rate. However, for smaller domains, the e®ective sample sizes within a given domain are so small that standard design-based estimators are not precise enough. Therefore, there is a need to improve the e±ciency for small areas. The overlaps in CPS samples over time and the availability of other states' records provide the development of reliable model-based unemployment rate estimators for the states. To improve the e±ciency for small areas, we turn to explicit small area models that make speci¯c allowance for between area variation, based on a Seasonal Autoregressive Integrated Moving Average (SARIMA) model. To carry out estimation of parameters in this random-e®ects version of time series model, a Bayesian inference methodology is constructed using Markov chain Monte Carlo methods. Through examining the model adequacy, and forecasting the last four observations for all the states, our model is shown to be reliable and e±cient.