Breast cancer is the most commonly occurring cancer in women and the second most common cancer overall. There were over 2 million new cases in 2018 . American Cancer Society screening guidelines for the early detection of breast cancer vary depending on a woman’s age and risk . Screening mammography is the primary imaging modality for the early detection of breast cancer. It has been shown to reduce breast cancer mortality by 38% - 48% among participants who were actually screened . Masses and microcalcification clusters that appear in mammographic images are an important early sign of breast cancer. Mammographic images are evaluated by human readers and the reading process is monotonous, tiring, lengthy, and costly . Moreover, due to the high variability of tumor shape, size, and the low contrast between tumor and surrounding breast tissues, manual classification yields significant classification error, in particular, false positives, thereby resulting in an unnecessarily large number of biopsies. To cope with this issue, many researchers have been working on development of computer-aided detection and diagnosis (CAD) systems for mammography  - .
CAD systems utilize image processing technique and pattern recognition theory to detect and classify abnormalities in mammographic images, which can provide an objective view to the radiologists  . The abnormalities in mammographic images include microcalcification, masses, architecture distortion, and asymmetry. Traditional CAD systems use handcrafted features based on prior knowledge and expert guidance. Although the traditional CAD systems demonstrate superior detection for breast cancers when used in combination with radiologists, they also significantly increase recall rate and have significant differences in false positives . Therefore, accurate detection of breast cancer has remained challenging.
Recent advances in machine learning have opened up an opportunity to address the challenging issue of early detection of breast cancer using deep learning (DL) methods. DL has attracted much attention in many fields, such as image recognition and biomedical image analysis. Convolutional neural network (CNN) is one of the most popular algorithms for DL and has been successfully applied to various fields and has achieved state-of-the-art performance in image recognition and classification  . After its success, CNN is also exploited in medical fields, such as image processing and CAD    , and it has reached or even surpassed human performance in image detection and classification. Many studies in medical fields have attempted to apply CNN to analyze mammographic images  - . These studies mainly deal with the detection of microcalcifications   and the classification of malignancy/benignancy for masses/lesions   . In particular, classification of breast density, which is an established risk marker for breast cancer, is a more difficult task than detection of microcalcifications.
According to the fifth edition of the American College of Radiology’s Breast Imaging Reporting and Data System (BI-RADS) lexicon , there are four categories for breast density: 1) almost entirely fatty, 2) scattered areas of fibroglandular density (or scattered density), 3) heterogeneously dense, 4) extremely dense. Of these 4 categories, assessment of almost entirely fatty and extremely dense breasts is highly consistent. However, there is greater variability distinguishing scattered density from heterogeneously dense parenchyma   .
In general, CNN performs image classification task directly on raw image pixels expressed in the spatial domain. However, in this case, the spectral information content of the image is not utilized in the classification. We consider that further improving image classification performance can be achieved by incorporating spectral feature information enhancing invariance of image features . In this work, we aim to construct an automatic classification system for breast density using a CNN model with wavelet transform (WT) for input data. As inputs to the CNN, we adopted the use of redundant wavelet coefficients of the segmented images instead of using original images. Our work focused on distinguishing between the two most difficult to distinguish BI-RADS density categories, namely, scattered density vs. heterogeneously dense. In order to demonstrate the effectiveness and usefulness of the proposed method, the results obtained from a conventional fine-tuning CNN model was compared with that from the proposed method.
The remaining of this paper is organized as follows. Section 2 describes the image data set used in the experiment and the algorithm of the proposed method. Section 3 presents the experimental results and comparison with a commonly used CNN model. Section 4 brings the discussion of the results. Section 5 draws the conclusion of this work.
2. Material and Methods
We utilized AlexNet  which is a well-known CNN model for classification of breast density. The AlexNet, which has been pre-trained with ImageNet , consists of five convolutional layers and three fully connected (FC) layers. In this work, we constructed a new CNN network by utilizing the earlier layers of the pre-trained AlexNet. The dataset used, extraction of image spectral information using wavelet transforms, and the architecture of the proposed fine-tuned CNN model are described below.
The image dataset used was mammogram X-ray DICOM images acquired from The Cancer Imaging Archive (TCIA) . TCIA is a large archive of medical images of cancer, accessible for public download. Thus, ethics issues do not arise in this work and the requirement to obtain informed consent was waived.
The dataset used include images with or without microcalcification/mass. It contains benign and malignant cases with verified pathology information. In this study, 650 images each of scattered density and heterogeneously dense (a total of 1300 images, up to 2 images from the same patient) were collected. Out of the collected images, 585 images each were used for re-training (a total of 1170 images) and 65 images each (a total of 130 images) were used for validation/testing. The collected images were manually segmented by a certified breast specialist to remove non-breast tissue areas. Because the collected images vary in dimensions, the segmented images are not the same size. Figure 1 shows an example of the segmented images.
2.2. Extraction of Image Spectral Information Using Wavelet Transforms
In this study, two-dimensional (2D) WT technique was used for extracting spectral information of original images. In the medical fields, 2D WT has been applied to data compression, image enhancement, noise removal, etc. . In the wavelet analysis, an image is initialized at level 0. The image is decomposed into four components of level 1: one low frequency component and three high frequency components. A smoothed image can be obtained from the low frequency component (low-low component: LL) and the detailed images can be obtained from the three high frequency components, i.e., low-high (LH), high-low (HL) and high-high (HH) components. Therefore, low frequency component and high frequency components are also referred to as smoothed component and detailed components, respectively. Decomposition is further performed on the LL component. When the decomposition is continuously performed, the resolution of the image decreases accordingly. More details about the WT can be found in .
In general, when WT is performed on a given image of size N × N, the sizes of the four decomposed components are reduced to N/4 × N/4. One of the shortcomings of the decimated WT is that it is not shift-invariant. As a result, disappearance
Figure 1. Example of the segmented images. (a) Scattered density (benign); image size 1386 × 2874 pixels, (b) Scattered density (malignant); image size 1986 × 3840 pixels, (c) Heterogeneously dense (benign); image size 810 × 2292 pixels, (d) Heterogeneously dense (malignant); image size 2316 × 4392 pixels.
of the outline of the decomposed images may occur. To overcome this issue, in this work we used a redundant discrete WT (RDWT) method. Unlike the conventional WT, the RDWT does not perform down-sampling operations. Thus, the four components at each level are the same size as the original image of level 0. The basic algorithm of the RDWT is that it applies the transform at each point of the image and saves the detailed coefficients and uses the approximation coefficients for the next level. The size of the coefficients array does not diminish from level to level  . There are different wavelet basis functions like haar wavelet, daubechies wavelet, biorthagonal spline wavelet, coiflet wavelet, meyer wavelet, etc. In this study, daubechies order 2 (db2) was used. There is a reason for using db2. Since db2 is a compactly supported orthogonal wavelet, we consider that the coefficient values which might be able to distinguish features of interest can be obtained.
Figure 2 shows an example of 2D level 1 redundant wavelet decomposition. Figure 2(a) shows an original image and four-component images of the redundant wavelet transform at level 1. Figure 2(b) and Figure 2(c) are the combination of LL, LH and HL component images and the combination of three identical original images, respectively.
2.3. Architecture of the Proposed Fine-Tuning Convolutional Neural Network
CNN re-training was implemented using MATLAB running on a desktop computer system with the following specifications: Intel (R) Xeon (R) CPU E5 3.6 GHz and a NVIDIA GeForce GTX 1080Ti graphics. Input data to the network
Figure 2. Redundant discrete wavelet decomposition and combination of input data. (a) Level 1 wavelet decomposition. (b) Combination of LL, LH and HL component images. (c) Combination of three identical original images.
was the wavelet coefficients obtained from original mammographic images. AlexNet input starts with 227 by 227 by 3 images (3 channels). Thus, the collected images varying in size (see Figure 1) were resized to a smaller resolution of 227 × 227 using bicubic interpolation. Wavelet coefficients used as 3-channel input data in the proposed method were a combination of LL, LH, and HL components at level 1. To compare with the proposed method, the pixel values of 3 identical, original mammographic images were also used as inputs to the same network.
The re-training plus fine-tuning steps employed in our proposed method are described as follows.
Step 1: Remove the last two FC layers from the pre-trained AlexNet model.
Step 2: Build two new FC layers. Apply dropout with a probability of 50% immediately prior to each of the two FC layers to randomly deactivate the units. This gives different weights in each re-training process and results in increasing generalization performance. Apply L2 norm regularization to each layer to prevent over-learning and to improve generalization performance.
Step 3: The two newly built FC layers are appended to the remaining structure of AlexNet model which has been pre-learned on ImageNet database .
Step 4: The wavelet coefficients of LL, LH, and HL components at level 1 are considered as inputs to the modified model for re-training and validation/testing. As a result, a modified CNN model, our proposed model, is constructed.
We applied 10-fold cross-validation for the network re-training: dividing all the 1300 collected images randomly into 10 sets with an equal number of images in each set, each time using nine sets (1170 images) for re-training a leaving one set (130 images) for validation/testing. The validation set was used to check the accuracy of the re-training process and to determine if there is an overfitting. In the re-training process, optimization of the hyper-parameters was performed using a stochastic gradient descent method. Here, Cross entropy cost function was used. Mini-batch size was 30. We adjusted the weight learn rate factor and bias learn rate factor to speed up the learning in the new final layers. For choosing the optimal number of epochs, accuracy was validated after each iteration round. Re-training will be stopped after ten consecutive iterations when the accuracy is no longer improving.
Figure 3 shows the flow chart of the proposed method. Figure 3(a) is the pre-trained AlexNet model which was designed for a 1000-class classification task. Figure 3(b) is the basic architecture of the proposed method. The input of the proposed network used for classify two categories is wavelet coefficients of mammographic images.
Two confusion matrices for classifying two categories of breast density, i.e., scattered density (DB2) and heterogeneously dense (DB3) are given in Figure 4.
Figure 3. CNN flowchart for breast density classification in mammography. (a) Pre-trained networks. (b) Proposed method.
Figure 4. Confusion matrix for two categories of breast density, scattered density (DB2) and heterogeneously dense (DB3). (a) Results obtained using wavelet coefficients (proposed method). (b) Results obtained using the compared method.
Figure 4(a) and Figure 4(b) show the results obtained from the proposed method and that obtained from the compared method, respectively. A confusion matrix is the most widely used quantitative measure for evaluation of the accuracy of a classification. It shows the relation between classification result and ground truth. Precision and recall shown in the figure are used as evaluation indices obtained from the confusion matrix. The values shown in the figure are the average of the values obtained by 10-fold cross-validation. Overall accuracy achieved 88.3% for the proposed method as compared to 85.4% for the compared method. Student’s t-test was used, and a statistically significant difference (P < 0.01) was observed.
Figure 5 shows the receiver operating characteristic (ROC) curves and the area under the curve (AUC) of the two breast-density categories obtained from the proposed method and the compared method. Figure 5(a) and Figure 5(b) are the results of scattered density and heterogeneously dense, respectively. As
Figure 5. ROC curves obtained from the proposed method and the compared method. (a) Scattered density. (b) Heterogeneously dense.
shown in the figures, The AUC of DB2 and BD3 obtained by the proposed method was 0.964, respectively, and that by the compared method was 0.948, respectively.
Figure 6 shows an example for verifying the effectiveness of the proposed method. Figure 6(a) is an original image. Figure 6(b) and Figure 6(c) show the 96 visualized features extracted from the first convolutional layer when wavelet coefficients (LL, LH and HL components) and the original image were used as inputs, respectively. Similarly, Figure 6(d) and Figure 6(e) are 256 visualized features extracted from the fifth convolutional layer when the wavelet coefficients and the original image were used, respectively.
As shown in Figure 4, the proposed method achieved an overall accuracy of is 88.3% as compared to the compared method of 85.4%. The proposed method showed a statistically significant difference (P < 0.01), and suggesting its effectiveness. The recall of BD2 of the proposed method and that of the compared method were 88.5% and 86.6%, respectively. Similarly, the recall of BD3 of the proposed method and that of the compared method were 88.2% and 84.3%. As for the precision of BD2 and BD3, the proposed method achieved 88.2% and 88.4%, the compared method reached 84.7% and 86.3%, respectively. As a whole, the proposed method outperforms the compared method in terms of recall and precision.
Oshima et al.  used AlexNet model to classifying four categories of mammary gland density in mammograms and the accuracy achieved 82.3%. Koshidaka et al.  reported a method for automatic classification of mammary gland density in mammograms using CNN model. In this report, three categories
Figure 6. An example verifying the effectiveness of the proposed method. (a) Original image. (b) and (c) 96 image features extracted from the first convolutional layer when wavelet coefficients and the original images are used as input, respectively. (d) and (e) 256 image features extracted from the fifth convolutional layer when wavelet coefficients and the original images are used as input, respectively.
except category 1 (almost entirely fatty) were classified using 93 cases. The accuracy achieved 86.0%. The image data used in the present study are different from the two mentioned studies, thus, it might not simply make a general comparison. Nevertheless, these results suggest the potential superiority of the proposed method.
It is obvious from Figure 5 that the ROC curves for BD2 and BD3 using the proposed method show higher true positive rates when false positive rates are low as compared to the compared method. The corresponding average AUCs are 0.964 and 0.948, respectively. The results indicate that the proposed method outperforms the compared method. Mohamed et al.  investigated a deep learning-based approach using CNN to classify DB2 and DB3 categories. In their report, a total of 22,000 images were used. The AUC was 0.9421 when trained on 7000 images. Since the datasets used in this investigation and in our study are different, it may not simply compare the results obtained by the two methods. In spite of this situation, the effectiveness of the proposed method is demonstrated.
Figure 6 gives an overview of visualized features that the proposed network learned. It can be seen from Figure 6 that the proposed network detects more detailed features at deeper layers. It is obvious from Figure 6(b) and Figure 6(c), with the use of the proposed method, the wavelet coefficients show good response to some specific kernels. In contrast, when using the compared method, the original image responds to almost all kernels, however, the degree of activation is considerably low. Figure 6(d) and Figure 6(e) show that more image features were detected when the proposed method was used. Same tendency was visually verified at the second to fourth convolutional layers. Thus, it is reasonable to say that whether effective features can be finally extracted or not depends on the selected information inputting to the initial layer of the CNN model employed. The comparison of our results to that obtained from the methods reported in the literature    suggest the superiority of the proposed method and its potential for improving the accuracy of classification of breast density categories.
Our study has some limitations. First, the wavelet basis function used was db2. It is undeniable that the use of other basis functions may lead to better results. We plan to investigate the effect on classification performance by selecting other wavelet basis functions and pursue to design a new architecture to further improve classification performance. Second, in this work, we used the pre-trained AlexNet model to classify breast density. In our future, we plan to utilize other CNN architectures, such as GoogLeNet, ResNet and SENet for classification performance comparison. Third, the number of images used for re-training was limited to 1170 images. Increase in the number of training images is necessary for further studies.
In this work, we proposed a fine-tuning method that utilized the pre-trained network based on AlexNet model. We modified the pre-trained AlexNet model by removing the last two fully connected layers and appending two newly created layers to the remaining structure. Unlike the common CNN-based methods, we adopted the use of level 1 redundant wavelet coefficients as inputs to the network. Experimental results demonstrate that the proposed method achieves encouraging classification performance in differentiating scattered density and heterogeneously dense. We believe that our proposed method will provide a promising computerized toolkit to help radiologists and serve as a second eye for them to classify breast density categories in breast cancer screening.
This work was supported in part by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 18K15641.
 World Cancer Research Fund International (2018) Breast Cancer Statistics.
 Broeders, M., Moss, S., Nystrom, L., et al. (2012) The Impact of Mammographic Screening on Breast Cancer Mortality in Europe: A Review of Observational Studies. Journal of Medical Screening, 19, 14-25.
 Ribli, D., Horváth, A., Unger, Z., Pollner, P. and Csabai, I. (2018) Detecting and Classifying Lesions in Mammograms with Deep Learning. Scientific Reports, 8, Article No. 4165.
 Baker, J.A., Rosen, E.L., Lo, J.Y., et al. (2003) Computer-Aided Detection (CAD) in Screening Mammography: Sensitivity of Commercial CAD Systems for Detecting Architectural Distortion. American Journal of Roentgenology, 181, 1083-1088.
 Karahaliou, A.N., Boniatis, I.S., et al. (2008) Breast Cancer Diagnosis: Analyzing Texture of Tissue Surrounding Microcalcifications. IEEE Transactions on Information Technology in Biomedicine, 12, 731-738.
 Eltonsy, N.H., Tourassi, G.D. and Elmaghraby, A.S. (2007) A Concentric Morphology Model for the Detection of Masses in Mammography. IEEE Transactions on Medical Imaging, 26, 880-889.
 Tang, J., Rangayyan, R.M., Xu, J., El Naqa, I. and Yang, Y. (2009) Computer-aided Detection and Diagnosis of Breast Cancer with Mammography: Recent Advances. IEEE Trans on Information Technology in Biomedicine, 13, 236-251.
 Li, Y., Chen, H., Gao, L. and Ma, J. (2016) A Survey of Computer-aided Detection of Breast Cancer with Mammography. Journal of Health & Medical Informatics, 7, Article ID: 100238.
 Lawrence, S., Giles, C.L., Tsoi, A.C. and Back, A.D. (1997) Face Recognition: A Convolutional Neural-Network Approach. IEEE Transactions on Neural Networks, 8, 98-113.
 Pan, S. and Yang, Q. (2010) A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359.
 Kooi, T., van Ginneken, B., Larssemeijer, N. and den Heeten, A. (2017) Discriminating Solitary Cysts from Soft Tissue Lesions in Mammography Using a Pretrained Deep Convolutional Neural Network. Medical Physics, 44, 1017-1027.
 Lekadir, K., Galimzianova, A., Betriu, A., et al. (2017) A Convolutional Neural Network for Automatic Characterization of Plaque Composition in Carotid Ultrasound. IEEE Journal of Biomedical and Health Informatics, 21, 48-55.
 Shin, H.-C., Roth, H.R., Gao, M., et al. (2016) Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging, 35, 1285-1298.
 Kooi, T., Litjens, G., van Ginneken, B., et al. (2017) Large Scale Deep Learning for Computer Aided Detection of Mammographic Lesions. Medical Image Analysis, 35, 303-312.
 Becker, A.S., Marcon, M., Ghafoor, S., et al. (2017) Deep Learning in Mammography: Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer. Investigative Radiology, 52, 434-440.
 Arevalo, J., Gonzáleza, F.A., Ramos-Pollán, R., et al. (2016) Representation Learning for Mammography Mass Lesion Classification with Convolutional Neural Networks. Computer Methods and Programs in Biomedicine, 127, 248-257.
 Jadoon, M.M., Zhang, Q., Haq, I.U., Butt, S. and Jadoon, A. (2017) Three-Class Mammogram Classification Based on Descriptive CNN Features. BioMed Research International, 2017, Article ID: 3640901.
 Samala, R.K., Chan, H.-P., Hadjiiski, L., Cha, K. and Helvie, M.A. (2016) Deep-Learning Convolution Neural Network for Computer-Aided Detection of Microcalcifications in Digital Breast Tomosynthesis. Proceedings of SPIE, 9785, 1-7.
 Samala, R.K., Chan, H.-P., Hadjiiski, L., et al. (2016) Mass Detection in Digital Breast Tomosynthesis: Deep Convolutional Neural Network with Transfer Learning from Mammography. Medical Physics, 43, 6654-6666.
 Wang, J., Yang, X., Cai, H., et. al. (2016) Discrimination of Breast Cancer with Microcalcifications on Mammography by Deep Learning. Scientific Reports, 6, Article No. 27327.
 Kallenberg, M., Petersen, K., Nielsen, M., et al. (2016) Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring. IEEE Transactions on Medical Imaging, 35, 1322-1331.
 Dubrovina, A., Kisilev, P., Ginsburg, B., Hashoul, S. and Kimmel, R. (2016) Computational Mammography Using Deep Neural Networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 6, 243-247.
 Mohamed, A.A., Berg, W.A., Peng, H., et al. (2018) A Deep Learning Method for Classifying Mammographic Breast Density Categories. Medical Physics, 45, 314-321.
 America College of Radiology (2019) ACR BI-RADS Atlas 5th Edition.
 Berg,W.A., Campassi, C., Langenberg, P. and Sexton, M.J. (2000) Breast Imaging Reporting and Data System: Inter- and Intraobserver Variability in Feature Analysis and Final Assessment. American Journal of Roentgenology, 174, 1769-1777.
 Matsuyama, E. and Tsai, D.-Y. (2018) Automated Classification of Lung Diseases in Computed Tomography Images Using a Wavelet Based Convolutional Neural Network. Journal of Biomedical Science and Engineering, 11, 263-274.
 Deng, J., Dong, W., Socher, R., et al. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 20-25 June 2009, 2-9.
 The Cancer Imaging Archive Collections (2019) Frederick National Laboratory for Cancer Research.
 Matsuyama, E., Tsai, D.-Y., Lee, Y., et al. (2013) A Modified Undecimated Discrete Wavelet Transform Based Approach to Mammographic Image Denoising. Journal of Digital Imaging, 26, 748-758.
 Koshidaka, M., Enomoto, K., Teramoto, A., et al. (2019) Preliminary Study on the Automated Classification of Breast Density in Mammogram Using Deep Convolutional Neural Network. Medical Imaging and Information Sciences, 36, 88-92.