JBiSE  Vol.6 No.10 , October 2013
Microarray data analysis: Gaining biological insights
DNA microarray is a widely used technique which allows one to identify the genes that are similarly or differentially expressed in different cell types or conditions, to learn how their expression levels change in different developmental stages or disease states, and to identify the cellular processes in which they participate. This technology produces a large amount of complex data, necessitating employment of multiple bioinformatics and computational tools and techniques to provide a comprehensive view of the underlying biology. This review overviews methods and techniques which may be employed to analyze and interpret microarray data. The focus is primarily on analysis of gene expression matrices to obtain biological insights to this end. Both supervised and unsupervised methods commonly used for expression data analysis have been discussed. Data visualization techniques which may be used to comprehend biological relevance of the data has also been discussed in brief.

Cite this paper
Grewal, R. and Das, S. (2013) Microarray data analysis: Gaining biological insights. Journal of Biomedical Science and Engineering, 6, 996-1005. doi: 10.4236/jbise.2013.610124.
[1]   Eisen, M.B. and Brown, P.O. (1999) DNA arrays for analysis of gene expression. Methods in Enzymology, 303, 179-205.

[2]   Causton, H.C., Quackenbush, J. and Brazma, A. (2003) Microarray gene expressions data analysis: A beginner’s guide. Blackwell Publishing, Malden.

[3]   Yang, Y.H., Dudoit, S., Luu, P. and Speed, T.P. (2001) Normalization for cDNA microarray data. Conference on Microarrays—Optical Technologies and Informatics, San Jose, 21-22 January 2001, 141-152.

[4]   Goodman, S.N. (1999) Toward evidence-based medical statistics: The P value fallacy. Annals of Internal Medicine, 130, 995-1004.

[5]   Efron, B. (1979) Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26.

[6]   Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate—A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B-Methodological, 57, 289-300.

[7]   Storey, J.D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society Series BStatistical Methodology, 64, 479-498.

[8]   Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genome wide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440-9445.

[9]   Butte, A. (2002) The use and analysis of microarray data. Nature Reviews Drug Discovery, 1, 951-960.

[10]   Quackenbush, J. (2005) Using DNA microarrays to assay gene expression. In: Baxevanis, A.D. and Ouellette, B.F.F., Eds., Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition, John Wiley & Sons, Inc., New Jersey, 409-444.

[11]   Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data clustering: A review. ACM Computing Surveys, 31, 264-323.

[12]   Eisen, M.B., Spellman, P.T., Brown, P.O. and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95, 14863-14868.

[13]   Soukas, A., Cohen, P., Socci, N.D. and Friedman, J.M. (2000) Leptin-specific patterns of gene expression in white adipose tissue. Genes & Development, 14, 963-980.

[14]   Li, B.-W., Wang, Z., Rush, A.C., Mitreva, M. and Weil, G.J. (2012) Transcription profiling reveals stageand function-dependent expression patterns in the filarial nematode Brugia malayi. BMC Genomics, 13, 184.

[15]   Hartigan, J.A. (1975) Clustering algorithms. Wiley, New York, London.

[16]   Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S. and Golub, T.R. (1999) Interpreting patterns of gene expression with self-organizing maps, methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America, 96, 2907-2912.

[17]   Toronen, P., Kolehmainen, M., Wong, G. and Castrén, E. (1999) Analysis of gene expression data using self-organizing maps. FEBS Letters, 451, 142-146.

[18]   Chitwood, D.H., Maloof, J.N. and Sinha, N.R. (2013) Dynamic transcriptomic profiles between tomato and a wild relative reflect distinct developmental architectures. Plant Physiology, 162, 537-552.

[19]   Liu, J. and Pham, T.D. (2011) Fuzzy clustering for microarray data analysis: A review. Current Bioinformatics, 6, 427-443.

[20]   Bacher, U., Kohlmann, A. and Haferlach, T. (2010) Gene expression profiling for diagnosis and therapy in acute leukaemia and other haematologic malignancies. Cancer Treatment Reviews, 36, 637-646.

[21]   Do, K.A. McLachlan, G.J., Bean, R. and Wen, S. (2007) Application of gene shaving and mixture models to cluster microarray gene expression data. Cancer Information, 5, 25-43.

[22]   Hastie, T., Tibshirani, R., Eisen, M.B., Alizadeh, A., Levy, R., Staudt, L., Chan, W.C., Botstein, D. and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1.

[23]   Tibshirani, R., Walther, G. and Hastie, T. (2001) Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society Series B-Statistical Methodology, 63, 411-423.

[24]   Borg, I. and Groenen, P.J.F. (2005) Modern multidimensional scaling: Theory and applications. 2nd Edition, Springer, New York, London.

[25]   Chen, Y. and Meltzer, P.S. (2005) Gene expression analysis via multidimensional scaling. In: Baxevanis, A.D., et al., Eds., Current Protocols in Bioinformatics, John Wiley & Sons Inc., New Jersey, Chapter 7.Unit 7.11-Unit 17.11.

[26]   Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537. http://dx.doi.org/10.1126/science.286.5439.531

[27]   Parry, R.M., Jones, W., Stokes, T.H., Phan, J.H., Moffitt, R.A., Fang, H., Shi, L., Oberthuer, A., Fischer, M., Tong, W. and Wang, M.D. (2010) K-nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics Journal, 10, 292-309.

[28]   Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M. and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97, 262-267.

[29]   Mitsakakis, N., Razak, Z., Escobar, M. and Westwood, J.T. (2013) Prediction of Drosophila melanogaster gene function using support vector machines. BioData Mining, 6, 8.

[30]   Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sherlock, G. (2000) Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nature Genetics, 25, 25-29.

[31]   Thomas, P.D., Mi, H. and Lewis, S. (2007) Ontology annotation: Mapping genomic regions to biological function. Current Opinion in Chemical Biology, 11, 4-11.

[32]   Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. and Kohane, I.S. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences of the United States of America, 97, 12182-12186.

[33]   Azuaje, F., Zhang, L., Jeanty, C., Puhl, S.L., Rodius, S. and Wagner, D.R. (2013) Analysis of a gene co-expression network establishes robust association between Col5a2 and ischemic heart disease. BMC Medical Genomics, 6, 13-22.

[34]   Werner, T. (2008) Bioinformatics applications for pathway analysis of microarray data. Current Opinion in Biotechnology, 19, 50-54.

[35]   Pandey, R., Guru, R.K. and Mount, D.W. (2004) Pathway miner: Extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data. Bioinformatics, 20, 2156-2158.

[36]   Wu, J., Mao, X., Cai, T., Luo, J. and Wei, L. (2006) KOBAS server: A web-based platform for automated annotation and pathway identification. Nucleic Acids Research, 34, W720-W724.

[37]   Boyle, S. and de Caestecker, M. (2006) Role of transcriptional networks in coordinating early events during kidney development. American Journal of Physiology. Renal Physiology, 291, F1-F8.

[38]   Pilpel, Y., Sudarsanam, P. and Church, G.M. (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics, 29, 153-159.

[39]   Sudarsanam, P., Pilpel, Y. and Church, G.M. (2002) Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Research, 12, 1723-1731. http://dx.doi.org/10.1101/gr.301202

[40]   Blüthgen, N., Kielbasa, S.M. and Herzel, H. (2005) Inferring combinatorial regulation of transcription in silico. Nucleic Acids Research, 33, 272-279.

[41]   Liu, C.C., Lin, C.C., Chen, W.S., Chen, H.Y., Chang, P.C., Chen, J.J. and Yang, P.C. (2006) CRSD: A comprehensive web server for composite regulatory signature discovery. Nucleic Acids Research, 34, W571-W577.

[42]   Veerla, S. and Hoglund, M. (2006) Analysis of promoter regions of co-expressed genes identified by microarray analysis. BMC Bioinformatics, 7, 384.

[43]   Chang, L.W., Fontaine, B.R., Stormo, G.D. and Nagarajan, R. (2007) PAP: A comprehensive workbench for mammalian transcriptional regulatory sequence analysis. Nucleic Acids Research, 35, W238-W244.

[44]   Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. and Kohane, I.S. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. The Proceedings of the National Academy of Sciences of the United States of America, 97, 12182-12186.

[45]   Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N. and Davidson, G.S. (2001) A gene expression map for Caenorhabditis elegans. Science, 293, 2087-2092.

[46]   Wang, J.C., So, B.H., Kim, J.H., Park, Y.J., Lee, B.M. and Kang, H.W. (2008) Genome-wide identification of pathogenicity genes in Xanthomonas oryzae pv. oryzae by transposon mutagenesis. Plant Pathology, 57, 1136-1145.

[47]   Holstege, F.C.P., Jennings, E.G., Wyrick, J.J., Lee, T.I., Hengartner, C.J., Green, M.R., Golub, T.R., Lander, E.S. and Young, R.A. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell, 95, 717-728.

[48]   Wyrick, J.J., Holstege, F.C.P., Jennings, E.G., Causton, H.C., Shore, D., Grunstein, M., Lander, E.S. and Young, R.A. (1999) Chromosomal landscape of nucleosome-dependent gene expression and silencing in yeast. Nature, 402, 418-421.