Prioritizing hypothesis tests for high throughput data with multiple testing methods
MetadataShow full item record
The rise of automated data production has vastly increased the scale and resolution of biological data, and a fundamental challenge in modern biology is finding the true effects in this flood of data. Because the probability of large chance effects increases with the number of variables tested, statistical stringency must increase with the resolution of the data to the extent that the probability of finding true effects often becomes very low. One solution is using independent information to identify promising targets. Although such filtering techniques have been in common use throughout the genomic era, little is known about the conditions required for filtering to be successful in significantly increasing power. Even inadequate filtering causes violation of type I error at level α. Our motivation is on these issues. We developed not only the method of optimal filtering which provides the maximum power of detection while controlling type I error but also the method of p- value weighting that will always perform at least as well and often better than optimal filtering based on Bonferroni correction and FDR methods. Using both simulated and real data, we show our methods to have excellent power relative to existing methods and show conditions for filtering to be successful through several scenarios.