JCT  Vol.12 No.6 , June 2021
Integrated Analysis of the Gene Expression Profiling and Copy Number Aberration of the Ovarian Cancer
Abstract: Objective: DNA copy number alterations and difference expression are frequently observed in ovarian cancer. The purpose of this way was to pinpoint gene expression change that was associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers, and investigated the bioinformatics significance for those correlated genes. Method: We obtained the DNA copy number and mRNA expression data from the Cancer Genomic Atlas and identified the most statistically significant copy number alteration regions using the GISTIC. Then identified the significance genes between the tumor samples within the copy number alteration regions and analyzed the correlation using a binary matrix. The selected genes were subjected to bioinformatics analysis using GSEA tool. Results: GISTIC analysis results showed there were 45 significance copy number amplification regions in the ovarian cancer, SAM and Fisher’s exact test found there have 40 genes can affect the expression level, which located in the amplification regions. That means we obtained 40 genes which have a correlation between copy number amplification and drastic up- and down-expression, which p-value < 0.05 (Fisher’s exact test) and an FDR < 0.05. GSEA enrichment analysis found these genes were overlapped with the several published studies which were focused on the gene study of tumorigenesis. Conclusion: The use of statistics and bioinformatics to analyze the microarray data can found an interaction network involved. The combination of the copy number data and expression has provided a short list of candidate genes that are consistent with tumor driving roles. These would offer new ideas for early diagnosis and treat target of ovarian cancer.

1. Introduction

The mortality and incidence of ovarian cancer are the first and second among the female reproductive-system cancers, respectively [1]. Its early detection/diagnosis is difficult; when symptoms manifest, pelvic or distant metastasis are usually observed [2], making complete removal of the tumor difficult, with 5-year-survival still remaining in 30% [3] [4] [5]. To improve its early detection and thus survival, understanding the molecular pathways in this disorder may be critical, which may open new therapeutic strategies.

The copy number aberration (CNA) is the structure variation of the different copy number in the special region of the genomic. From the genetic material of tumor tissue get the “driver CAN” is the main objective of the tumor diagnose and cytogenetic studies. Such alterations can indicate the genomic instability of a tumor and are a result of acquired somatic mutations in the evolution of tumor cells from a normal state to a neoplastic state. Therefore, characterization of genomic abnormalities may help elucidate the molecular pathogenesis of ovarian cancer as well as reveal the gentic markers of progression.

Many different expression genes have been identified in the gene expression profile, while most of those genes may be the “passenger gene”, which have a limit affective for the tumor development [6] [7]. The key challenge has been to identify driver oncogenes or tumor suppressor genes that play important roles during tumor initiation and progression [8]. Genomic DNA copy number aberration is an important type of genetic alteration observed in tumor cells, and it contributes to tumor evolution by alterations of the expression of genes within the region. Identified the over-expressed and amplification genes may be having a benefit, because these gene may represent driver gene aberrations [9].

Many studies had reported the CNA and different expression profile in the ovarian cancer, respectively [10] - [18], while few further studies had been carried out to explore the correlation between the amplification of CNA and gene expression. In this paper, we aimed at study the correlation between the amplification of CNA and expression genes, we only analyzed the genes located in the chromosomal regions with recurrent aberrations. The purpose of this way was to pinpoint gene expression change that were associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers, and investigated the bioinformatics significance for those correlated genes.

2. Material and Method

2.1. Material

In this study, we used the data from The Cancer Genome Atlas (TCGA) project. We downloaded 200 patients of ovarian cancer of the Level 2 copy number data and level 3 mRNA expression data from the DCC data portal, which used the same platform to measurement. We also download 50 patients of the normal ovarian tissue of the mRNA expression data.

2.2. Array-Based CGH Analysis

To identify possible regions of amplification, we segmented the level 2 copy number data using Circular Binary Segmentation (CBS) algorithm [19] [20]. It is included in Bioconductor package DNAcopy ( To identify the significance regions of common aberrations across all hybridizations, the Genomic Identification of Significant Targets in Cancer (GISTIC) approach was utilized on the data [21].

2.3. Integrated Analysis of the Copy Number Data and Expression Data

The purpose of this study was to pinpoint gene expression change that were associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers [22].

First, two-class unpaired Significance Analysis of Microarray (SAM) was used to find the genes, which located in the copy number amplification regions that have a differentially expressed between the tumor and normal samples. Genes with an FDR < 0.05 were considered to have significant differential expression and passed to the next stage.

Next, we build two matrices: one expression and one CNA, which are gene (row) by sample (column). At this stage, the CNA matrix is binary: if copy number amplification occurs in a particular gene in particular sample the element is one, otherwise the element is zero. Then, two-class unpaired SAM is used to find genes that are differentially expressed with respect to the copy number amplification status of a particular gene across the all tumor samples. Genes with an FDR < 0.05 are considered to have significant amplification-correlated differential expression and are passed to further analysis.

Last, an expression matrix is created, this time only containing genes deemed to have significant amplification-correlated differential expression in the previous step. Then, the matrix is converted to a diff-expressed binary matric with the following calculation: 1) the z-score for each expression matrix element is calculated with respect to that element’s row (i.e., gene specific); this is repeated for each row (gene). 2) For the diff-expressed binary, any element with a z-score > 2.0 or z-score < −2.0 is 1, otherwise the element is 0. Then, Fisher’s exact p-value is calculated for each gene in the expression matrix by populating a two-by-two contingency table with a binary expression vector (category one) and the CNA vector (category two); this process is repeated for each binary expression vector from the binary expression matrix. This calculation allowed us to recover only genes that had drastic amplification-correlated expression and to assign each correlation with an exact p-value. The entire process is repeated once for each amplification gene [23]. The entire process implemented in R language [24].

3. Results

3.1. Significant Regions of Copy Number Aberrations in Ovarian Cancer

Figure 1 showed the copy number segments for the two samples, average of 444 segments for per sample. In order to identify the significance regions of aberrations from the large of copy number segments, we used the GISTIC method to the segments data. A genome-wide view of the CNAs in the 200 tumors is shown in the Figure 2(a) and the significance of amplification regions is shown in Figure 2(b).

GISTIC analysis identified 48 regions of amplification along 21 chromosome arms (Figure 2(b)) distributed throughout the genome. The 48 significance amplification regions in the CNAs are shown in Table 1, including the frequency, the possible target genes, chromosome position, q-value. Several oncogenes previously known to have copy number changes in human ovarian cancer, such as CCNE, EVI1, MYC, FGFR3 and KRAS, were readily identified by GISTIC.

3.2. Integrated Analysis of Copy Number Aberration and Gene Expression

The corresponding gene expression probes within these CNARs were mapped to

Figure 1. The CBS analysis of the sample TCGA.04.153.01 and TCGA.13.0766.01.Y-axis: The points are normalized log ratios, and the red lines are the mean values among points segments obtained by CBS; X-axis: the points are in alternate colors to indicate different chromosomes.

Figure 2. (a) Genomic profiles of 200 ovarian cancers generated by array CGH. Each column in the right panel represents a tumor sample and rows represent gains and losses of DNA sequences along the length of chromosomes 1 through 22 as determined by segmentation analysis of normalized log2 ratios. The color scale ranges from blue (loss) through white (two copies) to red (gain). (b). GISTIC analysis of copy number gains in the ovarian carcinomas. The statistical significance of aberrations identified by GISTIC are displayed as false discovery rate q values to account for multiple hypothesis testing (q values: green line is 0.25 cut-off for significance).

Table 1. Amplification peak regions identified by GISTIC.

139 unique genes. To evaluate whether the expression levels of the 139 genes were differentially expressed, we applied SAM statistical analyses on gene expression data between tumor and normal tissues. We identified 55 individual genes have significant differentially expressed (Figure 3). Among them, 45 genes showing concordance in the same directional change of both CNA and gene expressed were selected for further exploration.

To further analysis these genes which have the same directional change of both CNA and gene expression, patients were divided into two groups as described in the methods: the “copy number varied” group and the “copy number neutral” group. Next, for each one of such genes, an unpaired two class SAM method was applied to the two groups, by which we found 44 genes which can

Figure 3. The red line was the positive significant genes and the green line was the negative significant genes.

influence the expression levels between tumor tissues with and without copy number alterations. To confirm the impact is the resulted of copy number alteration, we performed Fisher’s exact test as the methods described, and identified 40 genes lead to at least one of the 17,765 genes differentially expression. That means we obtained 40 genes which have a correlation between copy number amplification and drastic up- and down-expression, which p-value < 0.05 (Fisher’s exact test) and an FDR < 0.05. These results indicate that CNAs are important elements in driving downstream gene signaling in ovarian tumorigenesis.

3.3. Gene Set Enrichment Analysis (GSEA)

In order to explored the 40 genes functional in the cancer progress and development, we used the Molecular Signatures Database v4.0 in the GSEA investigate gene sets. We found there have several gene sets was overlapped with our genes [15] - [20]. Among them, most gene sets were association with kinds of cancer. The gene sets detail description was in Table 2.

4. Discussion

It is well-known that there are many causative elements contributing to cancer progression and tumorigenesis, such as transcriptional dysregulations, sequence mutations, and genetic variations. Among these complicated factors, Copy Number Alterations (CNAs) have been widely reported to serves as a key driver of genetic variation [25]. In this study, we analyzed CNAs by array CGH. Frequent chromosomal regions with high levels of amplifications and deletions were identified from the study. Additionally, to account for the complex relationship between copy number and gene expression, we performed an integrated analysis on ovarian cancer to identify differentially expressed genes with concordant genomic alterations and explored the impact on the gene expression. Finally, gene set enrichment analysis was used to find these driver genes bioinformatics information.

Table 2. GSEA enrichment analysis.

CNA analysis provided general insights into genomic alterations in ovarian adenocarcinoma and identified CNA regions were highly similar to those reported previously [26] [27] [28] [29] [30]. Noticeably, gain at 3q26.2 was detected at the highest frequency (85%) and 8q24.3 at the second (80%). Taken together, we speculated that identified CNAs, especially gain 3q26.2 and 8q24.3 as well as including candidate genes (EVI1, NFKBIL2, FOXH1, FBXL6, CPSF1, CYHR1, VPS28, SLC39A4, GPR172A, KIFC2, ADCK5), may play an important biological role in the pathogenesis in ovarian cancer. Indeed, a detail genomic analysis of gene EVI1 has been performed on ovarian cancer cells [31] [32]. Furthermore, we also found some putative oncogenes in these CNARs, such KRAS, CCNE1, MYC. etc.

Regarding the 174 genes residing in CNARs, significantly different expression associated CNAs was detected in 55 genes (40%). Among the selected genes, 45 genes (82%) showed positive correlation between CNA and mRNA expression and 10 genes (18%) showed negative correlation. The most positively correlated gene, TACC3, was identified here but no functional study is available at this time. However, the second gene, CCNE1, has been shown to play an important role in the development and processes of ovarian cancer [33] [34] [35]. In addition, the elevated correlations of the 45 concordantly changed genes further evidenced that our statistical approaches are able to efficiently identify dysregulation genes based on CNA.

To further explore whether these 45 CNA-driven genes can affect the mRNA expression levels, the SAM statistical and Fisher exact test have been implemented as describe in methods. After excluding those not affect mRNA expression levels (p-value < 0.05, Fisher’s exact test), only 40 (80%) genes remained for further analysis. The results shown that most copy number alterations can affect the mRNA expression levels, especially those putative oncogenes of ovarian cancer as previously reported in the ovarian cancer research, such as CCNE1, KRAS, EVI1.

5. Conclusion

Based on these analyses, we believe that the identification of driver genes in tumor amplicons can be greatly facilitated by studying gene expression patterns in conjunction with gene network data. The combination of the copy number data and expression has provided a short list of candidate genes that are consistent with tumor driving roles.


The study was partially supported by a grant from Guangdong Province innovation school project No. 2018KQNCX128, Xiamen Medical and Health Guidance Project No. 3502Z20209111


*Authors contribute equally.

#Corresponding author.

Cite this paper: Liu, X. , Liu, Z. , Yu, W. , Zhan, N. , Xie, L. , Xie, W. , Zhu, Z. and Deng, Z. (2021) Integrated Analysis of the Gene Expression Profiling and Copy Number Aberration of the Ovarian Cancer. Journal of Cancer Therapy, 12, 387-398. doi: 10.4236/jct.2021.126034.

[1]   Jemal, A., Siegel, R., Ward, E., et al. (2006) Cancer Statistics, 2006. CA: A Cancer Journal for Clinicians, 56, 106-130.

[2]   Goff, B.A., Mandel, L., Muntz, H.G., et al. (2000) Ovarian Carcinoma Diagnosis. Cancer, 89, 2068-2075.<2068::AID-CNCR6>3.0.CO;2-Z

[3]   Bankhead, C. (2004) For Ovarian Cancer, an Optimal Treatment Remains to Be Found. Journal of the National Cancer Institute, 96, 96-97.

[4]   Ozols, R.F., Bookman, M.A., Connolly, D.C., et al. (2004) Focus on Epithelial Ovarian Cancer. Cancer Cell, 5, 19-24.

[5]   Cannistra, S.A. (2004) Cancer of the Ovary. The New England Journal of Medicine, 351, 2519-2529.

[6]   Rosen, D.G., Yang, G., Deavers, M.T., et al. (2006) Cyclin E Expression Is Correlatedwith Tumor Progression and Predicts a Poor Prognosis in Patients with Ovarian Carcinoma. Cancer, 106, 1925-1932.

[7]   Haber, D.A. and Settleman, J. (2007) Cancer: Drivers and Passengers. Nature, 446, 145-146.

[8]   Farley, J., Smith, L.M., Darcy, K.M., et al. (2003) Cyclin E Expression Is a Significant Predictor of Survival in Advanced, Suboptimally Debulked Ovarian Epithelial Cancers: A Gynecologic Oncology Group Study. Cancer Research, 63, 1235-1241.

[9]   Fan, B., Dachrut, S., Coral, H., et al. (2012) Integration of DNA Copy Number Alterations and Transcriptional Expression Analysis in Human Gastric Cancer. PLoS ONE, 7, e29824.

[10]   Gorringe, K.L., Jacobs, S., Thompson, E.R., et al. (2007) High-Resolution Single Nucleotide Polymorphism Array Analysis of Epithelial Ovarian Cancer Reveals Numerous Microdeletions and Amplifications. Clinical Cancer Research, 13, 4731-4739.

[11]   Park, J.T., Li, M., Nakayama, K., et al. (2007) Notch3 Gene Amplification in Ovarian Cancer. Cancer Research, 66, 6312-6318.

[12]   Nanjunda, M., Nakayama, Y., Cheng, K.W., et al. (2007) Amplification of MDS1/ EVI1 and EVI1, Located in the 3q26.2 Amplicon, Is Associated with Favorable Patient Prognosis in Ovarian Cancer. Cancer Research, 67, 3074-3084.

[13]   Greenman, C., Stephens, P., Smith, R., et al. (2007) Patterns of Somatic Mutation in Human Cancer Genomes. Nature, 446, 153-158.

[14]   Nakayama, K., Nakayama, N., Jinawath, N., et al. (2007) Amplicon Profiles in Ovarian Serous Carcinomas. International Journal of Cancer, 120, 2613-2617.

[15]   Snijders, A.M., Nowee, M.E., Fridlyand, J., et al. (2003) Genome-Wide Array-Based Comparative Genomic Hybridization Reveals Genetic Homogeneity and Frequent Copy Number Increases Encompassing CCNE1 in Fallopian Tube Carcinoma. Oncogene, 22, 4281-4286.

[16]   Hough, C.D., Sherman-Baust, C.A., Pizer, E.S., et al. (2000) Large-Scale Serial Analysis of Gene Expression Reveals Genes Differentially Expressed in Ovarian Cancer. Cancer Research, 60, 6281-6287.

[17]   Welsh, J.B., Zarrinkar, P.P., Sapinoso, L.M., et al. (2001) Analysis of Gene Expression Profiles in Normal and Neoplastic Ovarian Tissue Samples Identifies Candidate Molecular Markers of Epithelial Ovarian Cancer. Proceedings of the National Academy of Sciences of the United States of America, 98, 1176-1181.

[18]   Tonin, P.N., Hudson, T.J., Rodier, F., et al. (2001) Microarray Analysis of Gene Expression Mirrors the Biology of an Ovarian Cancer Model. Oncogene, 20, 6617-6626.

[19]   Olshen, A.B., Venkatraman, E.S., Lucito, R., et al. (2004) Circular Binary Segmentation for the Analysis of Array-Based DNA Copy Number Data. Biostatistics, 5, 557-572.

[20]   Venkatraman, E.S. and Olshen, A.B. (2007) A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data. Bioinformatics, 23, 657-663.

[21]   Beroukhim, R., Getz, G., Nghiemphu, L., et al. (2007) Assessing the Significance of Chromosomal Aberrations in Cancer: Methodology and Application to Glioma. Proceedings of the National Academy of Sciences of the United States of America, 104, 20007-20012.

[22]   Cheng, L., Wang, P., Yang, S., et al. (2012) Identification of Genes with a Correlation between Copy Number and Expression in Gastric Cancer. BMC Medical Genomics, 5, 14.

[23]   Masica, D.L. and Karchin, R. (2011) Correlation of Somatic Mutation and Expression Identifies Genes Important in Human Glioblastoma Progression and Survival. Cancer Research, 71, 4550-4561.

[24]   R Development Core Team (2006) A Language and Environment for Statistical Computing. Vienna.

[25]   Pollack, J.R., Sorlie, T., Perou, C.N., et al. (2002) Microarray Analysis Reveals a Major Direct Role of DNA Copy Number Alteration in the Transcriptional Program of Human Breast Tumors. Proceedings of the National Academy of Sciences of the United States of America, 99, 12963-12968.

[26]   Haverty, P.M., Hon, L.S., Kaminker, J.S., et al. (2009) High-Resolution Analysis of Copy Number Alterations and Associated Expression Changes in Ovarian Tumors. BMC Medical Genomics, 2, 21.

[27]   Nakayama, N.M., Nakayama, K., Shamima, Y., et al. (2010) Gene Amplification CCNE1 Is Related to Poor Survival and Potential Therapeutic Target in Ovarian Cancer. Cancer, 116, 2621.

[28]   Shih, I.M., Nakayama, K., Wu, G., et al. (2011) Amplification of the ch19p13.2 NACC1 Locus in Ovarian High-Grade Serous Carcinoma. Modern Pathology, 24, 638-645.

[29]   Chen, L., Xuan, J., Gu, J., et al. (2012) Integrative Network Analysis to Identify Aberrant Pathway Networks in Ovarian Cancer. Pacific Symposium on Biocomputing, Kohala Coast, 3-7 January 2012, 31-42.

[30]   Cope, L., Wu, R.C., Shih, I.M. and Wang, T.L. (2013) High Level of Chromosomal Aberration in Ovarian Cancer Genome Correlates with Poor Clinical Outcome. Gynecologic Oncology, 128, 500-505.

[31]   Jazaeri, A.A., Ferriss, J.S., Bryant, J.L., et al. (2010) Evaluation of EVI1 and EVI1s (Δ324) as Potential Therapeutic Targets in Ovarian Cancer. Gynecologic Oncology, 118, 189-195.

[32]   Dutta, P., Bui, T., Bauckman, K.A., et al. (2013) EVI1 Splice Variants Modulate Functional Responses in Ovarian Cancer Cells. Molecular Oncology, 7, 647-668.

[33]   Wrzeszczynski, K.O., Varadan, V., Byrnes, J., et al. (2011) Identification of Tumor Suppressors and Oncogenes from Genomic and Epigenetic Features in Ovarian Cancer. PLoS ONE, 6, e28503.

[34]   Etemadmoghadam, D., DeFazio, A., Beroukhim, R., et al. (2009) Integrated Genome-Wide DNA Copy Number and Expression Analysis Identifies Distinct Mechanisms of Primary Chemoresistance in Ovarian Carcinomas. Clinical Cancer Research, 15, 1417-1427.

[35]   Cancer Genome Atlas Research Network (2011) Integrated Genomic Analyses of Ovarian Carcinoma. Nature, 474, 609-615.