Fetal lung development is crucial for the newborns’ safety. Neonatal respiratory morbidity (NRM), as the primary factor of mortality and morbidity associated with prematurity, mostly results from lung immaturity . In the past decade, the fetal lung maturity (FLM) was evaluated based on the size and volume of the FLM medical imaging methodologies, e.g., magnetic resonance imaging (MRI) and ultrasound    . Therein, ultrasound was considering as a promising technology for evaluating the lung immaturity due to its unique nature being radiation-free and cost-effective.
Recently, computer-aided technologies were applied in the fetal lung assessment which was considered as quantitative and objective. In view of this, Tekesin et al.  evaluated the fetal lung development through quantitative ultrasonic tissue characterization. In their study, the histogram of gray values within a manually delineated region of interest (ROI) was analyzed. Researchers in Spain developed a semi-automatic quantitative ultrasound analysis method for FLM evaluation by analyzing the texture information of the fetal lung using ultrasound, in which the regions of the fetal lungs were manually delineated  . However, most studies were semi-automatic because the fetal lung region in an ultrasound image was delineated manually by the clinicians. This may cause inter-operator errors, and introduce bias to the final evaluation. Thus, an automatic method for the fetal lung segmentation may not only minimize the inter-operator errors and improve the effectiveness of ultrasonographic fetal lung assessment, but also offer an objective method for the evaluation of the fetal lung development based on the morphological characteristics of the fetal lung.
On the other hand, with the development of the deep learning, image segmentation has been one of the hot topics in the community of image processing in the past decades. In view of this, Havaei et al.  presented a fully automatic brain tumor segmentation method based on deep neural networks. Hu et al.  proposed a method for computed tomography (CT) lung segmentation using mask region convolutional neural networks combined with supervised and unsupervised machine learning methods. However, rare researchers attempt on the segmentation of the fetal lung and heart on ultrasound images using deep learning.
Considered above, we proposed a deep learning method for automated fetal lung segmentation on ultrasound images using U-Net in this study.
2. Materials and Methods
2.1. Image Acquisition and Region Labeling
Data were retrospectively collected in the Department of Medical Ultrasound, Nanjing Medical University Affiliated Suzhou Hospital, Suzhou, China. A total of 300 ultrasound images were collected from an ultrasound equipment WS80A with Elite (Samsung Medison, Seoul, Korea) equipped with a curved array ultrasound probe (CA1-7A). All images were collected at the four-chamber view in the conditions of 1) one half of the fetal lung close to the probe; 2) the fetal spine at the direction of either 3 or 9 o’clock; and 3) the fetal heart at the diastole phase. The data were stored in the format of Digital Imaging and Communications in Medicine (DICOM) for the following process. An ultrasound physician with more than 6-year experience in the field of fetal ultrasound delineated the regions of the fetal lung and heart using a web-based tool for image annotation (Labelme) , as shown in Figure 1. The manually delineated fetal lung and heart was subsequently served as the ground truth for evaluating the automated segmentation method. The study was approved by the ethics committee of Nanjing Medical University Affiliated Suzhou Hospital, Suzhou, China.
2.2. Image Pre-Processing and Data Augmentation
All the ultrasound images used in the study were cropped with the image part in one of the three RGB channels remain for the subsequent process. Data augmentation was applied to the training data set via rotation, flip, and shift transformation, which is broadly used in deep learning , to increase the number of training data and improve the robustness of the model. Such an operation enlarged the training data set to 3500 images.
2.3. U-Net Model
In healthcare, large medical data are far from available, since the lack of medical data and experienced clinicians to annotate images. In view of this, the model should be well designed to avoid overfitting. In this study, the U-Net network was applied recognizing its good performance in the segmentation of medical images .
The U-Net architecture is based on the full convolutional network (FCN) . Compared to FCN, the U-Net performed up-sampling for four times and used skip connection in the same stage instead of directly supervising and loss-reverting on high-level semantic features . In this way, it ensures that the final recovered feature map not only incorporates both high resolution and high-level semantic information, but also allows fusion of multi-scales features, leading to multi-scale prediction and deep supervision.
The entire U-Net model contains two processes: the down-sampling process
Figure 1. U-Net architecture.
and the up-sampling process. In this study, we used four down-sampling layers and four up-sampling layers. Each down-sampling layer contains a max-pooling layer which down-samples a feature map to its half scale and two 3 × 3 convolution layers with padding. Each convolution layer is followed by a ReLU function and a batch normalization layer to achieve a good convergence  . The U-Net architecture used in this study is shown in Figure 1.
3. Training and Implementation
The original data set consists of 300 ultrasound images with the fetal lung and heart annotated, and was divided into the training data set (250 images) and testing data set (50 images). The data augmentation was performed on the training data set via rotation, flip and shift transformation, resulting in a 14-time-enlarged data set containing 3500 images. Noted that the testing and training samples were independent without overlaps.
The model was trained on a computer with a single GPU and 12 GB memory with the Adam optimizer . In addition, the initial learning rate, maximum epochs, mini-batch size were set to be 0.001, 100 and 2 respectively. Considering the low computing resources, the batch size is set as 2 to avoid system collapse in this study.
3.1. Evaluation Metrics
To evaluate the performance of the proposed method, the predicted segmentation results was compared with the manual annotated labels in terms of accuracy, recall, precision, and IOU, as defined following. In addition, True positive (TP), True negative (TN), False positive (FP) and False negative (FN) were used to evaluate the results, as defined in Table 1.
3.2. Statistical Analysis
Bland-Altman test was used to test the agreement between the automated and manual measurements for the segmented fetal lung and heart. SPSS (version 22.0
Table 1. Definition of the TP, TN, FP and FN.
for Windows, SPSS Inc., Chicago, IL, USA) were used for statistical analysis.
The performance of the U-Net model was calculated in terms of accuracy, precision, IOU and recall, as listed in Table 2. Figure 2 shows three segmented results. The Bland-Altman analysis was shown in Table 3 and Table 4 for evaluating
Figure 2. Fetal lung segmentation results: a. The fetal lung ultrasound images b. The manual lung and cardiac annotated masks c. The lung and cardiac segmentation results predicted by the proposed method. Masks in green and red represent lung area and cardiac area, respectively.
Table 2. Metric comparison of different methods
Table 3. Bland-Altman test for evaluating the agreement between auto and manual segmentations for fetal heart.
Table 4. Bland-Altman test for evaluating the agreement between auto and manual segmentations for fetal lung.
the agreement between the automated and manual segmentations of the fetal lung and heart, respectively.
The segmentation of the fetal lung from ultrasound images is one of the challenging and crucial steps for the computer-aided evaluation of the FLM. In this study, we proposed an automated segmentation technique for segmenting the fetal lung in ultrasound images.
By training on 3500 annotated ultrasound images with 50 epochs, the proposed models showed good performance in segmenting the fetal lung in terms of accuracy, precision, IOU and recall.
Figure 2 shows the segmentation results using the U-Net trained with fetal lung and cardiac region annotations. It shows that lung area can be accurately segmented compared with the manual annotations. In addition, the Bland-Altman tests as shown in Table 3 and Table 4 demonstrated the agreement of the manual and automated methods for segmenting fetal lung and heart.
Segmentation of the fetal lung would not only support the computer-aided fetal lung evaluations, e.g., the texture analysis for the prediction of FLM, but also assist in the monitoring of the fetal development. In this regard, previous work adopted the segmentation of the fetal brain and tissue to assess the fetal development  .
This study presented some limitations. Firstly, the regions of the fetal lung in the ultrasound images of the training and testing data sets were delineated by physicians, which may introduce bias. Secondly, the data used in this study were only collected from Chinese fetuses and a limited number of ultrasound machines.
In future, the performance of our proposed method on fetal lung images from other countries would be evaluated, and other feature extractors would be tried to improve the model considering the above limitations.
This study proposed a robust method for automatic fetal lung segmentation in ultrasound images using U-Net model. By training on 3500 ultrasound fetal lung images, the proposed model could segment the fetal lung with a good accuracy. The proposed model could be potentially applied not only to improve existing studies in quantitative analyzing the fetal lung using ultrasound, e.g., texture analysis of fetal lung and prediction of the neonatal respiratory morbidity, but also to assist the clinicians in daily measurement of the fetal lung/heart.
 Teune, M.J., et al. (2011) A Systematic Review of Severe Morbidity in Infants Born Late Preterm. American Journal of Obstetrics and Gynecology, 205, 374.e1-374.e9. https://doi.org/10.1016/j.ajog.2011.07.015
 Mahieu-Caputo, D., et al. (2001) Fetal Lung Volume Measurement by Magnetic Resonance Imaging in Congenital Diaphragmatic Hernia. Bjog, 108, 863-868. https://doi.org/10.1111/j.1471-0528.2001.00184.x
 Besnard, A.E., et al. (2013) Lecithin/Sphingomyelin Ratio and Lamellar Body Count for Fetal Lung Maturity: A Meta-Analysis. European Journal of Obstetrics & Gynecology and Reproductive Biology, 169, 177-183. https://doi.org/10.1016/j.ejogrb.2013.02.013
 Hu, Q., et al. (2020) An Effective Approach for CT Lung Segmentation Using Mask Region-Based Convolutional Neural Networks. Artificial Intelligence in Medicine, 103, 101792. https://doi.org/10.1016/j.artmed.2020.101792
 Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 3431-3440. https://doi.org/10.1109/CVPR.2015.7298965
 Dahl, G.E., Sainath, T.N. and Hinton, G.E. (2013) Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, 2013, 8609-8613. https://doi.org/10.1109/ICASSP.2013.6639346
 Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015, 448-456.
 Ebner, M., et al. (2020) An Automated Framework for Localization, Segmentation and Super-Resolution Reconstruction of Fetal Brain MRI. NeuroImage, 206, 116324. https://doi.org/10.1016/j.neuroimage.2019.116324