OJAppS  Vol.11 No.9 , September 2021
Analysis of the Relationship between Image and Blood Examinations in an Artificial Intelligence System for the Molecular Diagnosis of Breast Cancer
Abstract: Molecular subtype classification based on tumor genotype has recently been used for differential diagnosis of breast cancer. The shift from conventional tissue classification to molecular genetics-based classification is primarily because objective genetic information can ensure a biologically clear classification system and patient groups may be created for a given set of diagnoses and suitable treatments. Given the stressful nature of biopsy, radiomic studies are conducted to determine breast cancer subtypes using non-invasive imaging tests. Minimally invasive blood tests using microRNAs (miRNAs) contained in exosomes have been developed. We investigated the usefulness of radiomic features and miRNAs in distinguishing triple-negative breast cancer (TNBC) from other cancer types. Fat suppression T2-weighted magnetic resonance images and miRNAs of 60 cases (9 TNBC and 51 others) were retrieved from the Cancer Genome Atlas Breast Invasive Carcinoma. Six radiomic features and six miRNAs were selected by least absolute shrinkage and selection operator. Linear discriminant analysis was employed to distinguish between TNBC and others. With miRNAs, TNBC and others were completely separated, whereas with radiomic features, TNBC overlapped with other types of breast cancer. Receiver operating characteristic curve analysis results showed that the area under the curve of radiomic features and miRNAs was 0.85 and 1.0, respectively. miRNAs showed a higher discrimination performance than radiomic features. Although gene analysis is expensive and facilities for performing it are limited, miRNAs for blood tests may be useful in artificial intelligence systems for the molecular diagnosis of breast cancer.

1. Introduction

Medical treatment for cancer is performed in the following order: detection of the lesion, differential diagnosis, and treatment. Research on computer-aided diagnosis (CAD) has led to the development of techniques that detect lesions in medical images and distinguish between benign and malignant lesions [1] [2] [3]. In contrast, radiomics analyzes the relationship between imaging phenotype and genotype of lesions. Radiomics differs from the CAD research in that it supports the medical process after the detection of lesions. Therefore, CAD can be classified as an artificial intelligence (AI) system that supports the first half of medical care, and radiomics is an AI system that supports the second half of medical care.

With the progress in post-genome research, the molecular and genetic backgrounds of various cancers have been clarified. This knowledge not only facilitated molecular classification but also aided in the development of molecular-targeted drugs. Molecular diagnosis of cancer using genetic information enables a clear biological classification, whereas the molecular classification method remains directly associated with the selection of appropriate molecular-targeted drugs. However, for the molecular diagnosis of cancer, tumor cells need to be collected via biopsy, which imposes a significant burden on the patient. Additional constraints include limited availability of facilities for performing gene analysis and the high cost of gene analysis. Therefore, the possibility to easily determine the tumor genotype from non-invasive imaging using radiomics would be advantageous.

A minimally invasive examination using liquid biopsy, such as cell-free DNA, circulating tumor cells, and exosomes, has been performed. In particular, it has been reported that microRNA (miRNA) contained in exosomes derived from cancer cells can be used to detect the presence of cancer with high accuracy [4] [5] [6] [7]. Further, information on tumor genotype can also be obtained. Given the growing demand for molecular diagnosis techniques, studies to clarify the relationship between imaging and genetic examinations are considered important.

Radiomic studies on breast cancer have estimated breast cancer subtypes from various imaging tests [8] - [19] and have predicted the prognosis or recurrence [20] [21] [22]. Of the different subtypes, triple-negative breast cancer (TNBC) accounts for approximately 20% of all breast cancers. TNBC has a very high recurrence rate within 3 years and a shorter survival time after recurrence than that with other breast cancer types. Furthermore, because only anticancer drugs are expected to have therapeutic effects, it is important to distinguish between TNBC and other types of breast cancer [8] [9]. The main contribution of this study is to evaluate the usefulness of radiomic features and miRNAs in distinguishing TNBC from other breast cancer types, in order to construct an AI system that considers the division of roles between genetic testing and imaging tests. If radiomic features and miRNAs have an inclusive relationship, AI supporting second half of the medical care can be realized using either imaging or blood tests.

2. Materials and Methods

2.1. Imaging and Clinical Data

In this study, we used the Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) database in the Cancer Imaging Archive [23]. TCGA-BRCA contains data from 139 patients with breast cancer. However, magnetic resonance (MR) images and genetic information were not available for all cases. Therefore, we selected 60 cases for which fat-suppressed contrast-enhanced T1-weighed images and miRNAs were available. The public database also includes information on whether the hormone receptor was positive or negative, human epidermal growth factor receptor 2 was positive or negative, and Ki67 was high or low. Based on the information, 60 cases were classified into two groups: TNBC (9 cases) and others (51 cases) (Table 1). The study protocol was approved by the Ethics Review Committee.

2.2. Gene Data

Each miRNA comprises 10 - 100 base sequences. Because miRNAs contained in exosomes are used in liquid biopsy, miRNA was adapted as the genetic information for this study. From the TCGA-BRCA database, we obtained miRNAs, which were taken from tumor cells. The obtained miRNA were used by adding read per million corrections to the read count. This correction is performed when comparing samples and is the number of counts divided by the total number of reads and multiplied by a constant. There were 1325 miRNAs, but most of the data contained zero elements. Therefore, 255 miRNAs with all non-zero elements in the 60 selected cases were used for this study.

2.3. Radiomic Features

The slice with largest tumor diameter was selected from the MR images. The MR

Table 1. Immunohistochemistry-defined subtype classification.

image was converted to 512 × 512 pixels using linear interpolation. As per the rules for tumor region marking, when there were multiple tumors in an MR image, the one with largest tumor area was selected. When there were spicules and incorrect edges, they were marked as the tumor region to accurately quantify the radiomic features related to shape. An example of tumor region marking is shown in Figure 1.

To normalize the pixel value, we performed a linear density transformation on all MR images. Because the MR images had noise with extremely high pixel values, when linear density transformation was applied, the maximum pixel value after the transformation was affected by noise. To solve this problem, we calculated the upper 0.05% pixel value of the density histogram and set the pixel value above that pixel value as 1023 and then performed linear density transformation so that the minimum and maximum pixel values were 0 and 1023, respectively. Here, we assumed that noise existed in 0.05% of the entire image and thereafter empirically determined the value.

We calculated 298 radiomic features from the tumor region of the MR image after linear density transformation. Free software MaZda [24] [25] [26] was used to calculate the radiomic features. These features comprise 1 size feature, 9 histogram features, 272 texture features, and 16 resolution features. The default values of MaZda were adopted as parameters for calculating these radiomic features. For example, the parameters when calculating the density co-occurrence matrix of texture features were 16 density gradations; 1 - 5 in distance between pixels; and 0˚, 45˚, 90˚, and 135˚ in direction.

2.4. Selection of Radiomic Features and miRNAs

The numbers of radiomic features (298) and miRNAs (255) were greater than the number of cases (60). Hence, selection of useful radiomic features and miRNAs is necessary to distinguish between TNBC and other cancer types. In

Figure 1. An example of manually segmented tumor region.

this study, radiomic features and miRNAs were selected using the least absolute shrinkage and selection operator (LASSO) [27], which is obtained by the following equation:

β ^ l a s s o = arg min β { 1 2 i = 1 N ( y i β 0 j = 1 p x i j β j ) 2 + λ j = 1 p | β j | } (1)

By switching the input data to radiomic features or miRNAs, radiomic features and miRNAs were selected separately. Here, yi is TNBC or others of the ith patient. xj indicates radiomic feature or miRNA. βjare coefficients and β0 is a constant term. λ ≥ 0 is a complexity parameter that controls the degree of reduction. p represents the total number of radiomic features and miRNAs. The parameter βj can be obtained by solving the quadratic programming problem in Equation (1). In this study, λ was set in such a manner that the number of radiomic features or miRNAs with a non-zero coefficient βj was 6. Three-fold cross validation was performed to determine the value of λ that minimizes the average deviation. When the values of λ obtained in the process of this calculation were used in order, the value of λ was adopted so that the number of radiomic features or miRNAs with non-zero coefficients was 6. At this instance, depending on the input data, six features could not be selected, whereas five or seven features could be selected.

2.5. Visualization by Multidimensional Scaling (MDS)

Although LASSO can reduce the dimension of radiomic features or miRNAs, they are still multidimensional data. Thus, it is not easy to understand the relationship between these multidimensional data and breast cancer subtypes. If the number of dimensions can be reduced to two, the relationship can be visualized because it can be displayed as a scatter plot. Therefore, in this study, we used MDS [27] to reduce the radiomic features or miRNAs to two dimensions. MDS is also called principal coordinate analysis, and a new axis is constructed using the following procedure. First, the distance matrix dij comprising the Euclidean distance of input i and input j was calculated, and the transformation matrix zij is then obtained, which is defined by

z i j = 1 2 ( d i j 2 i = 1 n d i j 2 n j = 1 n d i j 2 n + i = 1 n j = 1 n d i j 2 n 2 ) (2)

The transformation matrix is used to move the origin to the center of gravity for n input data. Finally, the new coordinate points were determined as the coordinate values on the axis given by the eigenvector of matrix zij. Because MDS is a linear transformation that maintains the Euclidean distance between data, it can be interpreted by reproducing the relative positional relationship of multidimensional data in a low-dimensional space.

2.6. Differentiation between TNBC and Others

We employed linear discriminant analysis (LDA) [28] to distinguish between TNBC and other cancer types. Six radiomic features or six miRNAs selected by LASSO were used as input data for LDA. LDA determines the hyperplane that best discriminates the two groups of TNBC and others, assuming the variances of each group of TNBC and others are the same in the feature space. The hyperplane is defined as follows:

z = a 1 x 1 + a 2 x 2 + + a i x i + a 0 (3)

Here, z is the discrimination score, xi are the radiomic features or miRNAs, ai are the coefficients, and a0 is a constant value. A high discrimination performance with LDA indicates that radiomic features or miRNAs can be used as biomarkers to discriminate TNBC and others as LDA input/output is a simple relational expression. The leave-one-out method has been used to learn and test LDA [28]. The discrimination performance was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curve analysis. The LABROC4 algorithm [29], developed at the University of Chicago, was used for ROC curve analysis.

3. Experimental Results

The six miRNAs and six radiomic features selected by LASSO are listed in Table 2. A scatter plot projecting these features into two dimensions using MDS is shown in Figure 2. When miRNA was used, TNBC and others were completely separated. However, when radiomic features were used, TNBC overlapped with other types of breast cancer. Results of LDA when the number of radiomic features or miRNAs was changed are demonstrated in Figure 3. In this figure, three radiomic features were not selected by LASSO, and the number next to 2 was 4. With three miRNAs, the highest AUC value was 1.0, whereas with six radiomic features, the highest AUC value was 0.881. These results indicated that miRNAs have a higher discrimination performance than radiomic features. Results of plotted output values of LDA with three miRNAs and six radiomic features on the horizontal and vertical axes, respectively, are demonstrated in Figure 4. When miRNAs on the horizontal axis were used, TNBC and others could be completely separated. However, when radiomic features on the vertical axis were used, a significant overlap was observed. When the output values of miRNA and

Table 2. Six micro RNAs and six radiomic features selected by least absolute shrinkage and selection operator.

Figure 2. Scatter plot in two dimensions using multidimensional scaling. (a) Micro RNA; (b) Radiomic features.

Figure 3. Area under the curve of receiver operating characteristic curve analysis for discriminating between triple-negative breast cancer and others using micro RNAs and radiomic features.

radiomic features were integrated, the discrimination boundary could be generated in the diagonal direction; thus, the separation between TNBC and others tended to be larger.

4. Discussion

In this study, miRNA was identified to be a more potent factor than radiomic features in distinguishing TNBC from other cancer types. Therefore, if exosomes derived from breast cancer cells are isolated and miRNA contained in the exosomes are analyzed, TNBC can be detected with high accuracy via liquid biopsy. If the data on genetic properties of breast cancer can be obtained by a minimally invasive blood test, the superiority of radiomics, which can easily estimate the genotype of cancer by a non-invasive imaging test, would be compromised. However, it is difficult to obtain information on the anatomical location and extent of the lesion using genetic testing. Hence, in addition to genetic testing, it is important to study the radiomic features and search for measures to integrate them to improve accuracy. Herein, the integrated analysis of miRNA and radiomic features improved the discrimination performance (Figure 4).

This study aimed to discriminate between TNBC and other cancers, which are classified based on the genetic nature of breast cancer. Notably, genetic testing is conducted under conditions that are more favorable than those for imaging tests. Studies have reported the use of predict PCR using radiomic features after classifying breast cancer into subtypes by genetic testing [30] [31] [32] [33] [34]. These studies established the division of roles between genetic testing and imaging. One by one clarification is warranted to determine the part of medical care to which the concepts of radiomics and liquid biopsy can be applied to realize an AI system that supports personalized medicine.

The present study has certain limitations owing to the small number of cases included as the experiment was conducted using a public database. Another limitation is that we could not compare blood tests and imaging tests directly as miRNAs obtained from exosomes derived from breast cancer cells in the blood were not used. Future studies are warranted to address these concerns.

Figure 4. Relationship of linear discriminant analysis outputs between microRNA and radiomic features.

5. Conclusion

The study identified miRNA as a more potent factor than radiomic features in distinguishing TNBC from other cancers. However, because it is difficult to obtain information on the anatomical location and extent of the lesion by genetic testing, it is important to clarify the radiomic features that are complementary to the genetic data. Research in this regard is believed to be important for constructing an AI system that considers the division of roles between genetic testing and imaging tests in the near future.


A part of this study was supported by a Grant-in-Aided for Scientific Research (C) (No.21K12707) and a grant from Suzuken Memorial Foundation. We would like to thank Editage ( for English language editing.

Cite this paper: Wada, N. , Nakashima, M. and Uchiyama, Y. (2021) Analysis of the Relationship between Image and Blood Examinations in an Artificial Intelligence System for the Molecular Diagnosis of Breast Cancer. Open Journal of Applied Sciences, 11, 1016-1027. doi: 10.4236/ojapps.2021.119074.

[1]   Doi, K. (2007) Computer-Aided Diagnosis in Medical Imaging: Historical Review, Current Status and Future Potential. Computerized Medical Imaging and Graphics, 31, 198-211.

[2]   Giger, M.L. (2004) Computerized Analysis of Images in the Detection and Diagnosis of Breast Cancer. Seminars in Ultrasound, CT and MRI, 25, 411-418.

[3]   Li, Q. and Nishikawa, R.M. (2015) Computer-Aided Detection and Diagnosis in Medical Imaging. CRC Press, Boca Raton.

[4]   Rabinowits, G., Gercel-Taylor, C.G., Day, J.M., Taylor, D.D. and Kloecker, G.H. (2009) Exosomal microRNA: A Diagnostic Marker for Lung Cancer. Clinical Lung Cancer, 10, 42-46.

[5]   Eichelser, C., Stückrath, I., Müller, V., Milde-Langosch, K., Wikman, H., Pantel, K. and Schwarzenbach, H. (2014) Increased Serum Levels of Circulating Exosomal microRNA-373 in Receptor-Negative Breast Cancer Patients. Oncotarget, 5, 9650-9663.

[6]   Que, R., Ding, G., Chen, J. and Cao, L. (2013) Analysis of Serum Exosomal microRNAs and Clinicopathologic Features of Patients with Pancreatic Adenocarcinoma. World Journal of Surgical Oncology, 11, 219.

[7]   Taylor, D.D. and Gercel-Taylor, C.G. (2008) MicroRNA Signatures of Tumor-Derived Exosomes as Diagnostic Biomarkers of Ovarian Cancer. Gynecologic Oncology, 110, 13-21.

[8]   Wang, J., Kato, F., Oyama-Manabe, N.O., Li, R., Cui, Y., Tha, K.K., Yamashita, H., Kudo, K. and Shirato, H. (2015) Identifying Triple-Negative Breast Cancer Using Background Parenchymal Enhancement Heterogeneity on Dynamic Contrast-Enhanced MRI: A Pilot Radiomics Study. PLoS ONE, 10, e0143308.

[9]   Feng, Q., Hu, Q., Liu, Y., Yang, T. and Yin, Z. (2020) Diagnosis of Triple Negative Breast Cancer Based on Radiomics Signatures Extracted from Preoperative Contrast-Enhanced Chest Computed Tomography. BMC Cancer, 20, 579.

[10]   Ma, W., Zhao, Y., Ji, Y., Guo, X., Jian, X., Liu, P. and Wu, S. (2019) Breast Cancer Molecular Subtype Prediction by Mammographic Radiomic Features. Academic Radiology, 26, 196-201.

[11]   Li, H., Zhu, Y., Burnside, E.S., Huang, E., Drukker, K., Hoadley, K.A., Fan, C., Conzen, S.D., Zuley, M., Net, J.M., Sutton, E., Whitman, G.J., Morris, E., Perou, C.E., Ji, Y. and Giger, M.L. (2016) Quantitative MRI Radiomics in the Prediction of Molecular Classifications of Breast Cancer Subtypes in the TCGA/TCIA Data Set. NPJ Breast Cancer, 2, 16012.

[12]   Xie, T., Wang, Z., Zhao, Q., Bai, Q., Zhou, X., Gu, Y., Peng, W. and Wang, H. (2019) Machine Learning-Based Analysis of MR Multiparametric Radiomics for the Subtype Classification of Breast Cancer. Frontiers in Oncology, 9, 505.

[13]   Leithner, D., Horvat, J.V., Marino, M.A., Bernard-Davila, B., Jochelson, M.S., Ochoa-Albiztegui, R.E., Martinez, D.F., Morris, E.A., Thakur, S. and Pinker, K. (2019) Radiomic Signatures with Contrast-Enhanced Magnetic Resonance Imaging for the Assessment of Breast Cancer Receptor Status and Molecular Subtypes: Initial Results. Breast Cancer Research, 21, 106.

[14]   Leithner, D., Bernard-Davila, B.B., Martinez, D.F., Horvat, J.V., Jochelson, M.S., Marino, M.A., Avendano, D., Ochoa-Albiztegui, R.E., Sutton, E.J., Morris, E.A., Thakur, S.B. and Pinker, K. (2020) Radiomic Signatures Derived from Diffusion-Weighted Imaging for the Assessment of Breast Cancer Receptor Status and Molecular Subtypes. Molecular Imaging and Biology, 22, 453-461.

[15]   Leithner, D., Mayerhoefer, M.E., Martinez, D.F., Jochelson, M.S., Morris, E.A., Thakur, S.B. and Pinker, K. (2020) Non-Invasive Assessment of Breast Cancer Molecular Subtypes with Multiparametric Magnetic Resonance Imaging Radiomics. Journal of Clinical Medicine, 9, 1853.

[16]   Li, W., Yu, K., Feng, C. and Zhao, D. (2019) Molecular Subtypes Recognition of Breast Cancer in Dynamic Contrast-Enhanced Breast Magnetic Resonance Imaging Phenotypes from Radiomics Data. Computational and Mathematical Methods in Medicine, 2019, Article ID: 6978650.

[17]   Ni, M., Zhou, X., Liu, J., Yu, H., Gao, Y., Zhang, X. and Li, Z. (2020) Prediction of the Clinicopathological Subtypes of Breast Cancer Using a Fisher Discriminant Analysis Model Based on Radiomic Features of Diffusion-Weighted MRI. BMC Cancer, 20, 1073.

[18]   Son, J., Lee, S.E., Kim, E.K. and Kim, S. (2020) Prediction of Breast Cancer Molecular Subtypes Using Radiomics Signatures of Synthetic Mammography from Digital Breast Tomosynthesis. Scientific Reports, 10, Article No. 21566.

[19]   Demircioglu, A., Grueneisen, J., Ingenwerth, M., Hoffmann, O., Pinker-Domenig, K., Morris, E., Haubold, J., Forsting, M., Nensa, F. and Umutlu, L. (2020) A Rapid Volume of Interest-Based Approach of Radiomics Analysis of Breast MRI for Tumor Decoding and Phenotyping of Breast Cancer. PLoS ONE, 15, e0234871.

[20]   Li, H., Zhu, Y., Burnside, E.S., Drukker, K., Hoadley, K.A., Fan, C., Conzen, S.D., Whitman, G.J., Sutton, E.J., Net, J.M., Ganott, M., Huang, E., Morris, E.A., Perou, C.M., Ji, Y. and Giger, M.L. (2016) MR Imaging Radiomics Signatures for Predicting the Risk of Breast Cancer Recurrence as Given by Research Versions of MammaPrint, Oncotype DX, and PAM50 Gene Assays. Radiology, 281, 382-391.

[21]   Koh, J., Lee, E., Han, K., Kim, S., Kim, D.K., Kwak, J.Y., Yoon, J.H. and Moon, H.J. (2020) Three-Dimensional Radiomics of Triple-Negative Breast Cancer: Prediction of Systemic Recurrence. Scientific Reports, 10, Article No. 2976.

[22]   Jiang, X., Zou, X., Sun, J., Zheng, A. and Su, C. (2020) A Nomogram Based on Radiomics with Mammography Texture Analysis for the Prognostic Prediction in Patients with Triple-Negative Breast Cancer. Contrast Media & Molecular Imaging, 2020, Article ID: 5418364.

[23]   TCIA (2021).

[24]   Technical University of Lodz (2021).

[25]   Szczypiński, P.M., Strzelecki, M., Materka, A. and Klepaczko, A. (2009) MaZda—A Software Package for Image Texture Analysis. Computer Methods and Programs in Biomedicine, 94, 66-76.

[26]   Strzelecki, M., Szczypinski, P., Materka, A. and Klepaczko, A. (2013) A Software Tool for Automatic Classification and Segmentation of 2D/3D Medical Images. Nuclear Instruments and Methods in Physics Research, 702, 137-140.

[27]   Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning, Data Mining, Inference and Prediction. 2nd Edition, Springer, New York.

[28]   Duda, R.O., Hart, P.E. and Stork, D.G. (2001) Pattern Classification. John Wiley & Sons, New York.

[29]   Metz, C.E. (1989) Some Practical Issues of Experimental Design and Data Analysis in Radiological ROC Studies. Investigative Radiology, 24, 234-245.

[30]   Cain, E.H., Saha, A., Harowicz, M.R., Marks, J.R., Marcom, P.K. and Mazurowski, M.A. (2019) Multivariate Machine Learning Models for Prediction of Pathologic Response to Neoadjuvant Therapy in Breast Cancer Using MRI Features: A Study Using an Independent Validation Set. Breast Cancer Research and Treatment, 173, 455-463.

[31]   Drukker, K., Edwards, A., Doyle, C., Papaioannou, J., Kulkarni, K. and Giger, M.L. (2019) Breast MRI Radiomics for the Pretreatment Prediction of Response to Neoadjuvant Chemotherapy in Node-Positive Breast Cancer Patients. Journal of Medical Imaging, 6, Article ID: 034502.

[32]   Chen, X., Chen, X., Yang, J., Li, Y., Fan, W. and Yang, Z. (2020) Combining Dynamic Contrast-Enhanced Magnetic Resonance Imaging and Apparent Diffusion Coefficient Maps for a Radiomics Nomogram to Predict Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Patients. Journal of Computer Assisted Tomography, 44, 275-283.

[33]   Liu, Z., Li, Z., Qu, J., Zhang, R., Zhou, X., Li, L., Sun, K., Tang, Z., Jiang, H., Li, H., Xiong, Q., Ding, Y., Zhao, X., Wang, K., Liu, Z. and Tian, J. (2019) Radiomics of Multiparametric MRI for Pretreatment Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer: A Multicenter Study. Clinical Cancer Research, 25, 3538-3547.

[34]   Li, P., Wang, X., Xu, C., Liu, C., Zheng, C., Fulham, M.J., Feng, D., Wang, L., Song, S. and Huang, G. (2020) 18F-FDG PET/CT Radiomic Predictors of Pathologic Complete Response (pCR) to Neoadjuvant Chemotherapy in Breast Cancer Patients. European Journal of Nuclear Medicine and Molecular Imaging, 47, 1116-1126.