A Tight Prediction Interval for False Discovery Proportion under Dependence

ABSTRACT

The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.

The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.

Cite this paper

S. Shang, M. Liu and Y. Shao, "A Tight Prediction Interval for False Discovery Proportion under Dependence,"*Open Journal of Statistics*, Vol. 2 No. 2, 2012, pp. 163-171. doi: 10.4236/ojs.2012.22018.

S. Shang, M. Liu and Y. Shao, "A Tight Prediction Interval for False Discovery Proportion under Dependence,"

References

[1] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate—A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society Series B-Methodological, Vol. 57, No. 1, 1995, pp. 289- 300.

[2] J. D. Storey, “A Direct Approach to False Discovery Rates,” Journal of the Royal Statistical Society Series B- Statistical Methodology, Vol. 64, No. 3, 2002, pp. 479- 498. doi:10.1111/1467-9868.00346

[3] W. Pan, “On the Use of Permutation in and the Performance of a Class of Nonparametric Methods to Detect Differential Gene Expression,” Bioinformatics, Vol. 19, No. 11, 2003, pp. 1333-1340. doi:10.1093/bioinformatics/btg167

[4] J. D. Storey, J. E. Taylor, D. Siegmund, “Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach,” Journal of the Royal Statistical Society Series B-Statistical Methodology, Vol. 66, No. 1, 2004, pp. 187- 205. doi:10.1111/j.1467-9868.2004.00439.x

[5] X. D. Zhang, P. F. Kuan, M. Ferrer, X. Shu, Y. C. Liu, A. T. Gates, P. Kunapuli, E. M. Stec, M. Xu, S. D. Marine, et al., “Hit Selection with False Discovery Rate Control in Genome-Scale Rnai Screens,” Nucleic Acids Research, Vol. 36, No. 14, 2008, pp. 4667-4679. doi:10.1093/nar/gkn435

[6] E. L. Korn, J. F. Troendle, L. M. McShane and R. Simon, “Controlling the Number of False Discoveries: Application to High-Dimensional Genomic Data,” Journal of Statistical Planning and Inference, Vol. 124, No. 2, 2004, pp. 379-398. doi:10.1016/S0378-3758(03)00211-8

[7] E. L. Korn, M. C. Li, L. M. McShane and R. Simon, “An Investigation of Two Multivariate Permutation Methods for Controlling The False Discovery Proportion,” Statistics in Medicine, Vol. 26, No. 24, 2007, pp. 4428-4440. doi:10.1002/sim.2865

[8] C. R. Genovese and L. Wasserman, “A Stochastic Process Approach to False Discovery Control,” Annals of Statistics, Vol. 32, No. 3, 2004, pp. 1035-1061. doi:10.1214/009053604000000283

[9] C. R. Genovese and L. Wasserman, “Exceedance Control of the False Discovery Proportion,” Journal of the American Statistical Association, Vol. 101, No. 476, 2006, pp. 1408-1417. doi:10.1198/016214506000000339

[10] M. J. van der Laan, S. Dudoit and K. S. Pollard, “Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives,” Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, 2004, Article 15.

[11] N. Meinshausen, “False Discovery Control for Multiple Tests of Association under General Dependence,” Scandinavian Journal of Statistics, Vol. 33, No. 2, 2006, pp. 227-237. doi:10.1111/j.1467-9469.2005.00488.x

[12] Y. C. Ge, S. C. Sealfon and T. P. Speed, “Multiple Testing and Its Applications to Microarrays,” Statistical Methods in Medical Research, Vol. 18, No. 6, 2009, pp. 543-563. doi:10.1177/0962280209351899

[13] A. Farcomeni, “Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing,” Scandinavian Journal of Statistics, Vol. 36, No. 3, 2009, pp. 501-517.

[14] Q. Yang, J. Cui, I. Chazaro, L. A. Cupples and S. Demissie, “Power and Type I Error Rate of False Discovery Rate Approaches in Genome-Wide Association Stu- die,” BMC Genetics, Vol. 6, Suppl. 1, 2005.

[15] Y. Pawitan, S. Calza and A. Ploner, “Estimation of False Discovery Proportion under General Dependence,” Bioinformatics, Vol. 22, No. 24, 2006, pp. 3025-3031. doi:10.1093/bioinformatics/btl527

[16] R. Heller, “Correlated Z-Values and the Accuracy of Large-Scale Statistical Estimates Comment,” Journal of the American Statistical Association, Vol. 105, No. 491, 2010, pp. 1057-1059. doi:10.1198/jasa.2010.tm10240

[17] A. Schwartzman, X. H. Lin, “The Effect of Correlation in False Discovery Rate Estimation,” Biometrika, Vol. 98, No. 1, 2011, pp. 199-214. doi:10.1093/biomet/asq075

[18] Y. F. Huang, H. Y. Xu, V. Calian and J. C. Hsu, “To Permute or Not to Permute,” Bioinformatics, Vol. 22, No. 18, 2006, pp. 2244-2248. doi:10.1093/bioinformatics/btl383

[19] Y. Xie, W. Pan and A. B. Khodursky, “A Note on Using Permutation-Based False Discovery Rate Estimates to Compare Different Analysis Methods for Microarray Data,” Bioinformatics, Vol. 21, No. 23, 2005, pp. 4280- 4288. doi:10.1093/bioinformatics/bti685

[20] Y. C. Ge and X. Li, “Control of the False Discovery Proportion for Independently Tested Null Hypotheses,” Journal of Probability and Statistics, 2012, in Press.

[21] E. Roquain and F. Villers, “Exact Calculations for False Discovery Proportion with Application to Least Favorable Configurations,” Annals of Statistics, Vol. 39, No. 1, 2011, pp. 584-612. doi:10.1214/10-AOS847

[22] S. Ghosal and A. Roy, “Predicting False Discovery Proportion under Dependence,” Journal of the American Statistical Association, Vol. 106, No. 495, 2011, pp. 1208-1218. doi:10.1198/jasa.2011.tm10488

[23] Y. Shao and C. H. Tseng, “Sample Size Calculation with Dependence Adjustment for FDR-Control in Microarray Studies,” Statistics in Medicine, Vol. 26, No. 23, 2007, pp. 4219-4237. doi:10.1002/sim.2862

[24] A. Farcomeni, “Some Results on the Control of the False Discovery Rate under Dependence,” Scandinavian Journal of Statistics, Vol. 34, No. 2, 2007, pp. 275-297. doi:10.1111/j.1467-9469.2006.00530.x

[25] B. Efron, “Empirical Bayes Estimates for Large-Scale Prediction Problems,” Journal of the American Statistical Association, Vol. 104, No. 487, 2009, pp. 1015-1028. doi:10.1198/jasa.2009.tm08523

[26] B. Efron, “Correlation and Large-Scale Simultaneous Significance Testing,” Journal of the American Statistical Association, Vol. 102, No. 477, 2007, pp. 93-103. doi:10.1198/016214506000001211

[27] L. Wang, H. Tang, V. Thayanithy, S. Subramanian, A. L. Oberg, J. M. Cunningham, J. R. Cerhan, C. J. Steer and S. N. Thibodeau, “Gene Networks and microRNAs Implicated in Aggressive Prostate Cancer,” Cancer Research, Vol. 69, No. 24, 2009, pp. 9490-9497. doi:10.1158/0008-5472.CAN-09-2183

[1] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate—A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society Series B-Methodological, Vol. 57, No. 1, 1995, pp. 289- 300.

[2] J. D. Storey, “A Direct Approach to False Discovery Rates,” Journal of the Royal Statistical Society Series B- Statistical Methodology, Vol. 64, No. 3, 2002, pp. 479- 498. doi:10.1111/1467-9868.00346

[3] W. Pan, “On the Use of Permutation in and the Performance of a Class of Nonparametric Methods to Detect Differential Gene Expression,” Bioinformatics, Vol. 19, No. 11, 2003, pp. 1333-1340. doi:10.1093/bioinformatics/btg167

[4] J. D. Storey, J. E. Taylor, D. Siegmund, “Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach,” Journal of the Royal Statistical Society Series B-Statistical Methodology, Vol. 66, No. 1, 2004, pp. 187- 205. doi:10.1111/j.1467-9868.2004.00439.x

[5] X. D. Zhang, P. F. Kuan, M. Ferrer, X. Shu, Y. C. Liu, A. T. Gates, P. Kunapuli, E. M. Stec, M. Xu, S. D. Marine, et al., “Hit Selection with False Discovery Rate Control in Genome-Scale Rnai Screens,” Nucleic Acids Research, Vol. 36, No. 14, 2008, pp. 4667-4679. doi:10.1093/nar/gkn435

[6] E. L. Korn, J. F. Troendle, L. M. McShane and R. Simon, “Controlling the Number of False Discoveries: Application to High-Dimensional Genomic Data,” Journal of Statistical Planning and Inference, Vol. 124, No. 2, 2004, pp. 379-398. doi:10.1016/S0378-3758(03)00211-8

[7] E. L. Korn, M. C. Li, L. M. McShane and R. Simon, “An Investigation of Two Multivariate Permutation Methods for Controlling The False Discovery Proportion,” Statistics in Medicine, Vol. 26, No. 24, 2007, pp. 4428-4440. doi:10.1002/sim.2865

[8] C. R. Genovese and L. Wasserman, “A Stochastic Process Approach to False Discovery Control,” Annals of Statistics, Vol. 32, No. 3, 2004, pp. 1035-1061. doi:10.1214/009053604000000283

[9] C. R. Genovese and L. Wasserman, “Exceedance Control of the False Discovery Proportion,” Journal of the American Statistical Association, Vol. 101, No. 476, 2006, pp. 1408-1417. doi:10.1198/016214506000000339

[10] M. J. van der Laan, S. Dudoit and K. S. Pollard, “Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives,” Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, 2004, Article 15.

[11] N. Meinshausen, “False Discovery Control for Multiple Tests of Association under General Dependence,” Scandinavian Journal of Statistics, Vol. 33, No. 2, 2006, pp. 227-237. doi:10.1111/j.1467-9469.2005.00488.x

[12] Y. C. Ge, S. C. Sealfon and T. P. Speed, “Multiple Testing and Its Applications to Microarrays,” Statistical Methods in Medical Research, Vol. 18, No. 6, 2009, pp. 543-563. doi:10.1177/0962280209351899

[13] A. Farcomeni, “Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing,” Scandinavian Journal of Statistics, Vol. 36, No. 3, 2009, pp. 501-517.

[14] Q. Yang, J. Cui, I. Chazaro, L. A. Cupples and S. Demissie, “Power and Type I Error Rate of False Discovery Rate Approaches in Genome-Wide Association Stu- die,” BMC Genetics, Vol. 6, Suppl. 1, 2005.

[15] Y. Pawitan, S. Calza and A. Ploner, “Estimation of False Discovery Proportion under General Dependence,” Bioinformatics, Vol. 22, No. 24, 2006, pp. 3025-3031. doi:10.1093/bioinformatics/btl527

[16] R. Heller, “Correlated Z-Values and the Accuracy of Large-Scale Statistical Estimates Comment,” Journal of the American Statistical Association, Vol. 105, No. 491, 2010, pp. 1057-1059. doi:10.1198/jasa.2010.tm10240

[17] A. Schwartzman, X. H. Lin, “The Effect of Correlation in False Discovery Rate Estimation,” Biometrika, Vol. 98, No. 1, 2011, pp. 199-214. doi:10.1093/biomet/asq075

[18] Y. F. Huang, H. Y. Xu, V. Calian and J. C. Hsu, “To Permute or Not to Permute,” Bioinformatics, Vol. 22, No. 18, 2006, pp. 2244-2248. doi:10.1093/bioinformatics/btl383

[19] Y. Xie, W. Pan and A. B. Khodursky, “A Note on Using Permutation-Based False Discovery Rate Estimates to Compare Different Analysis Methods for Microarray Data,” Bioinformatics, Vol. 21, No. 23, 2005, pp. 4280- 4288. doi:10.1093/bioinformatics/bti685

[20] Y. C. Ge and X. Li, “Control of the False Discovery Proportion for Independently Tested Null Hypotheses,” Journal of Probability and Statistics, 2012, in Press.

[21] E. Roquain and F. Villers, “Exact Calculations for False Discovery Proportion with Application to Least Favorable Configurations,” Annals of Statistics, Vol. 39, No. 1, 2011, pp. 584-612. doi:10.1214/10-AOS847

[22] S. Ghosal and A. Roy, “Predicting False Discovery Proportion under Dependence,” Journal of the American Statistical Association, Vol. 106, No. 495, 2011, pp. 1208-1218. doi:10.1198/jasa.2011.tm10488

[23] Y. Shao and C. H. Tseng, “Sample Size Calculation with Dependence Adjustment for FDR-Control in Microarray Studies,” Statistics in Medicine, Vol. 26, No. 23, 2007, pp. 4219-4237. doi:10.1002/sim.2862

[24] A. Farcomeni, “Some Results on the Control of the False Discovery Rate under Dependence,” Scandinavian Journal of Statistics, Vol. 34, No. 2, 2007, pp. 275-297. doi:10.1111/j.1467-9469.2006.00530.x

[25] B. Efron, “Empirical Bayes Estimates for Large-Scale Prediction Problems,” Journal of the American Statistical Association, Vol. 104, No. 487, 2009, pp. 1015-1028. doi:10.1198/jasa.2009.tm08523

[26] B. Efron, “Correlation and Large-Scale Simultaneous Significance Testing,” Journal of the American Statistical Association, Vol. 102, No. 477, 2007, pp. 93-103. doi:10.1198/016214506000001211

[27] L. Wang, H. Tang, V. Thayanithy, S. Subramanian, A. L. Oberg, J. M. Cunningham, J. R. Cerhan, C. J. Steer and S. N. Thibodeau, “Gene Networks and microRNAs Implicated in Aggressive Prostate Cancer,” Cancer Research, Vol. 69, No. 24, 2009, pp. 9490-9497. doi:10.1158/0008-5472.CAN-09-2183