Estimation of genomic copy frequency with correlated observations
MetadataShow full item record
In this thesis, we compare several methods to handle correlated data related to genome frequency copies. First, we used standard Poisson Regression to analyze the data. From the results, we find that there are several problems related to over-dispersion and under-dispersion. It is easy to handle over-dispersion using the ‘scale-adjustment’ method. However, remedying problems related to dependence caused by correlated Poisson data are not so easily handled. We first created a statistic to help us test the null hypothesis that data are independent Poisson realizations vs. the alternative that they are positively associated. From this, we found that 225 base-pairs separation is the minimum cut-off distance needed to achieve approximate independence. We also used results from this analysis to devise a formula which yields the approximate correlation coefficient (r) between counts which are separated by ‘b’ base-pairs. Finally, we use our method to weight observations, and find significant improvement compared to other methods.