Single nucleotide polymorphism (SNP) array is a recently developed biotechnology that is extensively used in the study of cancer genomes. The various available platforms make cross-study validations/comparisons difficult. Meanwhile, sample sizes of the studies are fast increasing, which poses a heavy computational burden to even the fastest PC.Here, we describe a novel method that can generate a platform-independent dataset given SNP arrays from multiple platforms. It extracts the common probesets from individual platforms, and performs cross-platform normalizations and summari-zations based on these probesets. Since different platforms may have different numbers of probes per probeset (PPP), the above steps produce preprocessed signals with different noise levels for the platforms. To handle this problem, we adopt a platform-dependent smoothing strategy, and produce a preprocessed dataset that demonstrates uniform noise levels for individual samples.To increase the scalability of the method to a large number of samples, we devised an algorithm that split the samples into multiple tasks, and probesets into multiple segments before submitting to a parallel computing facility. This scheme results in a drastically reduced computation time and increased ability to process ultra-large sample sizes and arrays.
 N. Rabbee and T. P. Speed, “A Genotype Calling Algorithm for Affymetrix SNP Arrays,” Bioinformatics, Vol. 22, No. 1, 2006, pp. 7-12. http://dx.doi.org/10.1093/bioinformatics/bti741
 B. Carvalho, H. Bengtsson, et al., “Exploration, Normalization, and Genotype Calls of High-Density Oligonucleotide SNP Array Data,” Biostatistics, Vol. 8, No. 2, 2007, pp. 485-499. http://dx.doi.org/10.1093/biostatistics/kxl042
 Y. Nannya, M. Sanada, et al., “A Robust Algorithm for Copy Number Detection Using High-Density Oligonucleotide Single Nucleotide Polymorphism Genotyping Arrays,” Cancer Research, Vol. 65, No. 14, 2005, pp. 6071- 6079. http://dx.doi.org/10.1158/0008-5472.CAN-05-0465
 G. Yamamoto, Y. Nannya, et al., “Highly Sensitive Method for Genomewide Detection of Allelic Composition in Non-Paired, Primary Tumor Specimens by Use of Affymetrix Single-Nucleotide-Polymorphism Genotyping Microarrays,” American Journal of Human Genetics, Vol. 81, No. 1, 2007, pp. 114-126. http://dx.doi.org/10.1086/518809
 H. Bengtsson, P. Wirapati and T. P. Speed, “A Single- Array Preprocessing Method for Estimating Full-Resolution Raw Copy Numbers from All Affymetrix Genotyping Arrays Including Genome-Wide Snp5&6,” Bioinformatics, Vol. 25, No. 17, 2009, pp. 2149-2156. http://dx.doi.org/10.1093/bioinformatics/btp371
 H. Bengtsson, A. Ray, et al., “A Single-Sample Method for Normalizing and Combining Full-Resolution Copy Numbers from Multiple Platforms, Labs and Analysis Methods,” Bioinformatics, Vol. 25, No. 7, 2009, pp. 861- 867.
 R. Bosotti, G. Locatelli, et al., “Cross Platform Microarray Analysis for Robust Identification of Differentially Expressed Genes,” BMC Bioinformatics, Vol. 8, Supplement 1, 2007, p. S5. http://dx.doi.org/10.1186/1471-2105-8-S1-S5
 R. Beroukhim, G. Getz, et al., “Assessing the Significance of Chromosomal Aberrations in Cancer: Methodology and Application to Glioma,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 104, No. 50, 2007, pp. 20007-20012.
 M. G. Schimek, “Smoothing and Regression: Approaches, Computation, and Application,” Wiley Series in Probability and Statistics Applied Probability and Statistics Section, Wiley, New York, 2000. http://dx.doi.org/10.1002/9781118150658
 M. J. Walter, J. E. Payton, et al., “Acquired Copy Number Alterations in Adult Acute Myeloid Leukemia Genomes,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 106, No. 31, 2009, pp. 12950-12955.