JBiSE  Vol.3 No.3 , March 2010
A mixture model based approach for estimating the FDR in replicated microarray data
ABSTRACT
One of the mostly used methods for estimating the false discovery rate (FDR) is the permutation based method. The permutation based method has the well-known granularity problem due to the discrete nature of the permuted null scores. The granularity problem may produce very unstable FDR estimates. Such instability may cause scientists to over- or under-estimate the number of false positives among the genes declared as significant, and hence result in inaccurate interpretation of biological data. In this paper, we propose a new model based method as an improvement of the permutation based FDR estimation method of SAM [1] The new method uses the t-mixture model which can model the microarray data better than the currently used normal mixture model. We will show that our proposed method provides more accurate FDR estimates than the permutation based method and is free of the problems of the permutation based FDR estimators. Finally, the proposed method is evaluated using extensive simulation and real microarray data.

Cite this paper
nullJiao, S. and Zhang, S. (2010) A mixture model based approach for estimating the FDR in replicated microarray data. Journal of Biomedical Science and Engineering, 3, 317-321. doi: 10.4236/jbise.2010.33043.
References
[1]   Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significant analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

[2]   Long, A. D., Mangalam, H. J., Chan, B. Y. P., Tolleri, L., Hatfield, W. G. and Baldi, P. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical frame work, 276, 19937-19944.

[3]   Kerr, M.K., Martin, M. and Churchill, G. (2000) Analysis of variance for gene expression microarray data, Journal of Computational Biology, 7, 819-837.

[4]   Thomas, J.G., Olson, J.M., Tapscott, S.J. and Zhao, L. P. (2001) An efficient and robust statistical modelling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, 11, 1227-1236.

[5]   Baldi, P. L. and Long, A. D. (2001) A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes. Bioinformatics, 17, 509-519.

[6]   Kendziorski, C. M., Newton, M. A., Lan, H. And Gould, M. N. (2003) On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, 22, 819-837.

[7]   Newton, M., Noueiry, A., Ahlquist, P., Sarkar, D. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5(2), 155-176.

[8]   Smyth, G. K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.

[9]   Efron, B., Tibshirani R., Storey, J. D., Tusher, V. (2001) Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Association, 96, 1151-1160.

[10]   Efron, B., Tibshirani, R., Gross, V. and Chu, G. (2000) Microarrays and their use in a comparative experiment, Technical report, Statistics Department, Standard University.

[11]   Dudoit, S., Yang, H. Y., Callow, J. M. and Speed, P. T. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 111-139.

[12]   Troyanskaya, O. G., Garber, M. E., Brown, P. O., Botstein, D. and Altman, R. B. (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 18, 1454-1461.

[13]   Broberg, P. (2003) Ranking genes with respect to differential expression, Genome Biology, 4, R41.

[14]   Pan, W., Lin, J. and Le, C. (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Functional & integrative genomics, 3, 117-124.

[15]   Chu, G., Narasimhan, B., Tibshirani, R. and Tusher, V. SAM “significance analysis” of microarrays-users guide and technical document, http://www-stat.stanford.edu/~tibs/ SAM/sam.pdf.

[16]   Zhang, S. (2007) A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance, BMC Bioinformatics, 8, 230.

[17]   Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57, 289-300.

[18]   Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. PNAS, 100, 9440-9445.

[19]   Liu, C. and Rubin, D. (1995) ML estimation of the t distribution using EM and its extensions ECM and ECME. Statistica Sinica, 5, 19-39.

[20]   Jiao, S. and Zhang, S. (2008) The t-mixture model approach for detecting differentially expressed genes in microarrays. Functional & Integrative Genomics, 8, 181-186.

[21]   Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J.R. and Caligiuri, M. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 285, 531-537.

 
 
Top