Cancer is a genetic disease that leads to uncontrolled cell proliferation. It is a multistep process arising from gene mutations involved in signaling, cell-cycle and/or cell-death pathways resulting in a mis-regulation of the pathways  . These mutations produce an oncogene with a dominant function, and/or tumor suppressor genes causing a loss of function. Either ways, the cell cycle is disrupted leading to uncontrolled cell division and growth―a common feature of every cancer type  .
Breast cancer is a complex collection of diseases with characteristic clinical, histopathological, and molecular features  . Breast carcinogenesis involves genetic and epigenetic alterations that cause aberrant gene function  . The commencement and development of breast cancer is caused due to the gathering of genetic mutations which lead to abnormal cell functions  . These genetic mutations can be inherited or irregular and may lead to the activation of oncogenes and block tumor suppressor genes  . It is also shown that breast cancer can be caused by epigenetic alterations, which do not affect the primary DNA sequence but cause abnormal transcriptional regulation that causes a change in gene expression patterns involved in cellular proliferation, survival and differentiation  . Breast cancer is the most commonly occurring disease with around 1.38 million new breast cancer cases being diagnosed every year, globally  . It is also the most common cancer affecting women and stands second in causing cancer deaths among women  and fifth in cancer overall  . The survival of breast cancer patients is mainly associated with two factors―early detection of disease and adjuvant systemic treatment which includes chemotherapy and hormone therapy  . Treatments are largely dependent on clinic pathological conditions that include age, tumor size, histological grade, estrogen receptor (ER), progesterone receptor (PR), and HER2 status  . Although all patients with breast cancer are treated with chemotherapy, only a few patients will benefit from it  . Selecting an appropriate adjuvant therapy for breast cancer patients depends on reliable predictive markers  .
The microarray-based gene expression profiling has had a tremendous development in the past decade  . The discovery of this technology has been considered as a new dawn in cancer biology and oncology practice  . It has aided in the analysis of the multiple gene expression levels in a tumor sample  . With many gene expression profiling studies performed in the last decade, the datasets are available for analysis and meta-analysis  . Microarray-based gene expression profiling studies have helped in understanding the complexity of breast cancer which is not a simple disease but a collection of many genetic mutations  .
In this study, we used microarray-based gene expression profiling datasets of breast cancer from Gene Expression Omnibus (GEO) database to come up with significant gene set that can be used as initial set in developing predictive and prognostic markers specifically meant for breast cancer. This study also focuses on highlighting genes that are highly or lowly expressed in breast cancer which jieswill help in understanding the diagnosis, progress and development of personalized therapies for breast cancer.
2. Material and Methods
In order to identify meta analysis of BRCA1, BRCA2, BRCA1/2 and normal genes involved in breast cancer of both familial, sporadic, familial cancer aggregation and normal cells that helps the role of gene expression in disease progression can be analyzed using raw microarray data. We need to determine the disease mechanism and the role of signaling pathways, 1) gene expression measurements 2) definitions of signaling pathways and 3) drug target identification.
2.1. Raw Data Selection
We searched public gene expression datasets from GEO microarray database to identify expression patterns neoadjusted chemotherapy patients were seriously affect gene expression patterns and profiles will identify used 3 different conditions. First 12 samples of breast cancer affected to women with hereditary BRCA1 gene mutation, 1 sample from BRCA2 gene mutation of hereditary women, 8 samples had BRCAx (BRACA1/2) type mutation of familial history of breast/ovarian cancer, 10 samples were sporadic disease, 4 samples are familial cancer aggregation (FCA) but without prevalence of breast/ovarian cancer and 6 normal breast samples. All the probesets is annotated with reference platforms of hgu133plus2 is used to compare the gene names to identify differentially expressed genes.
2.2. Preprocessing of Raw Data
To make quality of data to identify differential expressed genes in breast cancer using different pre-processing techniques such as 1) filtering data of image intensity values 2) remove bias using null values filtering and normalization, background adjustment and gene transformation 3) quality control. There are different computational algorithms such as RMA, MAS5, GCRMA and Li-wong of R and bioconductor statistical packages to predict normalization. The RMA algorithm creates expression matrix of raw intensity values of both foreground/background models corrected with log of linear transformation. The MAS5 algorithm normalizes each array independently and sequentially to predict local average of mismatch and perfect match of background intensity corrections. The GCRMA algorithm helps to adjust for background intensities in affymetrix array data which include optical noise and non-specific binding data. The Li-Wong algorithm originally calculated PM-MM calculation that suggests noise in all the probe measurements of roughly same size. Other bioconductor package such as Biobase is used to create expressionsets of affymetrix datasets that used to develop phenotype data (pData). Different data normalization visualization platforms to predict boxplots, density plots, PCA plots are helpful for normalization prediction of raw data.
2.3. Differential Expression Analysis
The limma package of affymetrix data is used to identify differential expressed genes in breast cancer. The simplest dataset features replicated designs and progress through experiments with two or more groups, direct designs and factorial designs of different time course experiments. To assign column names of a set creates contrast matrix to perform all pairwise comparisons to compute estimated coefficients and standard errors of a given datasets.
The differential expression of the genes was analyzed by experimental shrinkage of the errors that leads arrive at a common value. This was attained by computing the t-statistics and log-odds of differential expression. It generates list of top 10 (“number = 10”) differentially expressed genes sorted by B-values (“sort.by = B”) for each of the three comparison groups (“coef = 1”) in this sample set. The summary table has logFC is the log2-fold change, the AveExpr is the average of expression value across all arrays and channels, the moderated t-statistic (t) is the logFC to its standard error, the p value is the associated p-value, the adj. p value is the p-value adjusted for multiple testing and the B-value (B) is the log-odds that a gene is differentially expressed (the-higher-the-better). Usually one wants to base gene selection on the adjusted p-value rather than the t- or B-values. Filters out candidates that have p-values < 0.05 in each group (“coef = 1”) and provides the number of candidates for each list. These numbers should be identical with the sum of the values in each circle of the above Venn diagram. Same as above, but with complex filter: p-value < 0.01 AND at least 2-fold change AND expression value A > 10. This function plots, heat diagram gene expression profiles of genes which are significantly differentially expressed in the primary condition (this is not a cluster analysis heat map). Genes are sorted by differential expression under the primary condition. The argument “primary = 1” selects the first contrast column in the “results” matrix as a primary condition.
2.4. Meta Analysis and Classification
Using geNETClassifier algorithm to classify the genes was differentially expressed in different disease datasets along with gene networks. The genome-wide association studies of expression sets or expression matrix files of ranked genes, probe sets of different variables is optimized with training sets. Using multi-class SVM based classifier to quires genes chosen for classification; the mutual-information (interactions) and the co-expression (correlations) between the genes are also calculated and analyzed by the algorithm. These allow estimating the degree of association between the variables and they are used to generate a gene network for each class. These networks can be plotted, providing an integrated overview of the genes that characterized each disease (i.e. each class).
2.5. Functional Annotation and Enrichment Analysis
The functional enrichment of the differentially expressed was done using the Gene Ontology (GO) database. The classification of the gene function and loci information was carried out using the GO database. Using the Prolifer package, the cluster analysis was done and it comprehends on the genes within the molecular functions and biological processes. The over-represented biological functions and pathways of the differentially expressed genes were identified using DAVID (Database for Annotation, Visualization and Integrated Discovery) and GOrilla tools.
3. Results and Discussion
In the current study, we have predicted the analysis of breast cancer genes that is classified based on familial, sporadic, FAC and normal cell types. Using Meta analysis techniques to classify differential expression of genes involved in breast cancer of BRCA1, BRCA2 and BRCA1/2 mutations. There are different sets of genes while comparing the hereditary (BRCA1-BRCA2), BRCA1-BRCAx, BRCA1-sporadic, BRCA1-FCA, BRCA1-normal, BRCA2-BRCAx, BRCA2-sporadic, BRCA2-FCA, BRCA2-normal, BRCAx-sporadic, BRCAx-FCA, BRCAx-normal, sporadic-FCA, sporadic-normal, FCA-normal sets to classify upregulated and downregulated genes that functionally involved in breast cancer.
3.1. Differential Gene Expression Analysis
After normalization of raw data by calculating foreground and background intensities of color of all probe sets is used to predict differential gene expression. Using limma package to determine the differential gene expression based on significance prediction of p-value (<0.05). A total of 54,675 genes are significantly used for differential classification. A total of 592 genes are upregulated and 810 genes shows downregulated that differentially expressed in BRCA1 gene mutation of ER signaling pathways in breast cancer. Using hierarchical clustering of 41 datasets using the 810 genes out of 54,675 genes with unique gene resulting in main branch, it is sub classified according to type of samples hereditary BRCA1, hereditary BRCA2, hereditary BRCAx, Sporadic, Sporadic FCA and Normal cell types by PAM50 classifier. The BRCA1 related most of the samples is cancers in ER cluster including BRCA1, BRCA2, BRCAx, FCA tumors of breast cancer (Figure 1 and Table 1).
3.2. Molecular Subtypes of BRCA1 Mutation Related Genes in Breast Cancer
The RNA expression profiling of BRCA1 mutations in breast cancer carries a general classification of hereditary, sporadic and normal cells. All the subgroups of BRCA1 shows KCJN3, CDH2, Bnc1 is more significantly associated with all the subgroups. Using clinical consideration of all top 10 up regulated and down regulated genes shows KCJN3, CDH2, LOC645323, RCBTB1, CYP27C1, Bnc1 and FXC1 is significant association with BRCA1 gene transcriptional regulation and cause mutations. The other genes such as KCNJ3, MYST4, dsp, SLC6A and EDDM3B is mainly associated with BRCA1 gene mutation, using functional
Figure 1. Differential gene expression of significant genes predicted based on p < 0.01.
Table 1. (a) Differential expression of BRCA1 gene mutation in ER cells of breast cancer; (b) Differential expression of BRCA2 gene mutation in ER cells of breast cancer; (c) Differential expression of BRCAx gene mutation in ER cells of Breast cancer; (d) Differential expression of FCA and Normal cells in ER cells of breast cancer.
enrichment analysis shows 19 genes is encodes for protein synthesis that significantly associated with transcriptional regulation control and cellular response of apoptosis. The overall list of genes is predicted in Table 2.
3.3. Differential Classification of BRCA2 Mutation Related Genes in Breast Cancer
The differential classification of BRCA2 mutation in breast cancer showed 22 upregulated and 40 downregulated genes. This gene mutation was found in ER cells of breast cancer. These genes have the post-transcriptional regulation (Mir9-2) in multicellular organisms that affect the stability and translation of mRNAs. The calcium dependent was governed by the CDH2 receptor genes. The cell-cell adhesion of glycoprotein has the cadherin repeats of transmembrane regions, which are highly conserved. All these genes are present in at the top of the data (Table 3).
3.4. Differential Expression of BRCAx Samples in Breast Cancer
The BRCAx is compared with sporadic samples of breast cancer shows 27 upregulated and 48 down regulated genes. The 10 top regulated genes such as CDH2 is common regulator of cell-cell adhesion in transmembrane protein, LOC645323 gene has transcriptional regulation in putative alternative promoters of breast cancer genes, RCBTB1 is interacts with the ACE2A receptor that regulate B-cell (Table 4).
3.5. Differential Expression of Sporadic Gene Mutation Samples in Breast Cancer
In Sporadic gene mutation shows 18 upregulated genes and 98 down regulated genes that differentially expressed in sporadic genes. To evaluate the reproducibility of the expression patterns of the signatures shows LOC645323, CDH2, CYP27C1 RCBTB1 genes is significantly associated with transcriptional regulation
Table 2. Upregulated genes of BRCA1 genes predicted based on p < 0.01.
Table 3. Upregulated genes of BRCA2 genes predicted based on p < 0.01.
Table 4. Upregulated genes of BRCAx genes predicted based on p < 0.01.
within mutastatis. This revealed 18 Sporadic - 98 Sporadic basal-like tumors in the RCBTB1 data set. The performance of the signature was estimated by LOC645323, using the SVM algorithm (Table 5).
3.6. Differential Expression of FCA Normal Tissues
The FCA with normal dataset has 42 upregulated and 46 down regulated genes that significantly associated with Breast cancer (Table 6).
3.7. Functional Enrichment
Based on functional annotation and enrichment analysis of familial, sporadic, BRCA1, BRCA2, BRCAx, FCA of both gene sets and functional categories can influence the results of functional enrichment analysis. In order to mitigate these effects, we suggest that instead of evaluating the overlap between 50 genes such as KCNJ3, MYST4, DSP, SLC6A4, EDDM3B, Six4, aqp11, Mir9-2, CDH2, GLC,
Table 5. Upregulated genes of sporadic genes predicted based on p < 0.01.
Table 6. Upregulated genes of FCA genes predicted based on p < 0.01.
Bnc1, RCBTB1, NPL and SHROOM2 gens traditionally done in functional enrichment analysis, one instead considers the overlap between the annotations made to a gene set and a branch of terms in the Gene Ontology. To accurately capture the significance of annotation overlap we develop a randomization scheme that preserves the transitive annotation features of the GO DAG while calculating the probability of obtaining a certain number of annotations between a gene set and a GO branch.
3.8. Significant Analysis of Breast Cancer Genes in Biomarkers
Usig SVM classification technique, we have classified 810 genes is differentially expressed in four conditions such as familial expression of BRCA1, BRCA2, BRCAx, Sporadic, FCA and normal cell types. Further we have classified the genes based on functional enrichment shows only 592 genes encodes proteins, these encoding proteins is significant expression of breast cancer. We have seperted the genes which only expressed in breast cancer shows only 30 genes is most common expression in breast cancer and is potentially used as a biomarkers. The most common genes such as GATA3, Foxa1, Tox3, Arg3, Arg2, Mmp12, Ptx3, AREG, SYNPO2, Scgb2a2, TFAP2B, Ptx3, Dach1, Vgll1, REEP6, MPP7, ANKRD30A, Igf2bp3, CYP4Z1, SLC6A4, FAM134A, DSP, PSPH, SDR16C5, AQP11, GLS, PSPH, TMC5 and SLITRK6 genes is mainly associated with transcriptional regulation (Figure 2 and Table 7).
Table 7. Gene ranking of SVM classified gene signatures predicted using breast cancer data.
Figure 2. Significantly associated genes predicted based on SVM classifier of breast cancer data.
3.9. Gene Network Prediction
We used SVM method to construct network using geNETClassifier algorithm to identify informative gene pairs and assign weights to sample pairs. There are two parametes to combine optimal combinations such as accuracy of the k-model fold cross validation by varying threshold values from 0.11 to 0.8 in intervals of 0.01 and the another threshold value of 0.85 to 0.9 in intervals of 0.5. We have performed 26 different experiments that varying these two threshold values that measure the accuracy of gene network. There are 592 genes were involved in gene network that has significant association with 54 genes alone have optimal isolation. The overall gene network is predicted in Figure 3 and top 30 gene signatures that predicted based on SVM Classifiers were represented in Figure 4 (Table 8).
In the current study of developing Meta analysis of identification of BRCA1 mutated gene signatures present in breast cancer. There are 6 types of datasets such as familial of BRCA1, BRCA2, BRCAx, Sporadic, FCA and normal sets of specific gene signatures. We have identified 810 genes that significantly associated with all the types of datasets, based on protein expression only 592 genes that help to metastatic condition to compare primary tumors with a FDA < 0.1. There are 52 genes involved in different pathways associated with metastasis. We have compared the list of genes in 6 independent groups which demonstrated that 30 genes is enriched for GO annotation significantly associated with different
Figure 3. Gene-gene network predicted based on classifiers of breast cancer significant genes used for drug targets.
Figure 4. Top 30 gene signatures that predicted based on SVM Classifiers.
Table 8. Top genes that significantly associated with breast cancer.
pathways of all 6 types of datasets and these genes are mainly used for potential drug targets. Furthermore this result helps to identify metastatic signature which could facilitate further research in metastasis, such as outcome prediction, drug discovery, and other functional studies.
The authors would like to express thanks to Dr. VidyaNiranjan, Prof and Head Department of Biotechnology, RV College of Engineering, Dr. Subramanya K.N. Principal, RV College of Engineering and the Faculty at Scientific Biominds Ltd., Bengaluru.
 Colombo, P.E., Milanezi, F., Weigelt, B. and Reis-Filho, J.S. (2011) Microarrays in the 2010s: The Contribution of Microarray-Based Gene Expression Profiling to Breast Cancer Classification, Prog-nostication and Prediction. Breast Cancer Research, 13, 212.
 Teegarden, D., Romieu, I. and Lelièvre, S.A. (2012) Redefining the Impact of Nutrition on Breast Cancer Incidence: Is Epigenetics Involved? Nutrition Research Reviews, 25, 68-95.