Statistical considerations of expression quantitative loci (eQTL) mapping with next generation sequencing data
MetadataShow full item record
Expression quantitative trait loci (eQTL) are loci on the genome that contribute to the expression levels of the messenger RNAs in an organism. They link the static genetic information of DNA sequence variation together with the dynamic genetic information of gene expression. Moreover, sequencing has become the dominant technology for genomic research, such as eQTL studies. Because of the precise resolution from genomic sequencing, it produces a tremendous amount of data for either gene expression profiles or the genetic variants in the subjects. Therefore, it requires the extraordinary intense significant threshold for multiple-testing adjustment if enormous numbers of statistical analyses are employed. Some strategies for reducing the total numbers of tests, for example by considering the physical distance between a genetic marker and a gene; or by constructing a co-expressed gene network, are designed to increase the statistical power for trans-eQTL detection. Here we proposed a statistical workflow to increase the trans-eQTL mapping power by both implementing a network-free co-expression method and the blocked weight false discovery rate (FDR) multiple-testing adjustment. On the other hand, RNA-sequence analyses use numbers of aligned short reads count to a gene as the proxy of expression level for such gene. The accuracy of the alignment is questionable when subject's genome has higher polyploidy. For example, lots of plants have more than two copies of chromosomes, as well as many homologous and paralogous genes that share great similarities in nucleotide sequence. The miss-assigned reads cause false positive results and lack of power to detect eQTL while using RNA sequencing data in plants. Thus, we also establish a bioinformatics and statistical framework to map eQTL with RNA sequencing data from polyploid libraries.