According to the World Health Organization, there are more than 10 million cases of tuberculosis in the world. However, the number of doctors does not match the number of patients. Less than 55,000 physicians across the United States are able to treat the disease. If patients are not treated in time, a lot of life will disappear. In areas such as India or Africa, where there are more patients and fewer doctors, this can be very serious. Therefore, we hope to build a model to improve the efficiency of the medical process. With a more accurate and effective image recognition and analysis system to help doctors, patients can get timely treatment. The morbidity of tuberculosis is also more than 30% in the past few years (see from Table 1) .
From 2005 to 2015, 92,587 cases of tuberculosis were reported in Beijing, with an average reported incidence of 44.8/100,000. See Table 1 for the reported incidence and incidence in each year
Convolutional neural network (CNN) is one of the most representative algorithm used in deep learning. In CNN, each image would be expressed as a matrix of RGB values, for better recognition. CNN has “shared-weights architecture and translation invariance” characteristics, which means, each neuron in the first layer is connected with all the neuron in the next layer. One neuron would only extract feature of the image. Only after all the neurons are connected together to constitute the matrix, the complete image could be recognized.
In the current studies, CNN model was frequently used for image recognition purpose in field such as social media or modeling system. To be more specific, CNN model is quite indispensable when applied to relevant technology including Infrared Imagery, Feature Extraction, and Three-dimensional Imagery. Despite the popularity, we rarely see this method contribute to medical purpose. Considering the issue of medical disequilibrium which occurs extensively across the world, our team thus created a model of medical image recognition.
In our research, CNN model was used to train the dataset and to realize the recognition of image. Basically, CNN model is a frame we used to build our system. And other methods such as Canny algorithm and the Mask algorithm are substances for filling this frame (accuracy improvement) The model should recognize the image of human lungs and thus calculate the possibility that the organ has been infected with certain disease, which, in this case, the tuberculosis (TB).
Table 1. Reported incidence of tuberculosis in Beijing from 2005 to 2015.
With the assistance of the system, the efficiency of the medical treatment process should be improved. This approach, originally created to assist the overwhelmed doctors with the problem of inability to diagnose excess patients, should be particularly benefit applying to area such as Southeast Asia or Africa.
2. Materials and Methods
The dataset we used is from Pulmonary Chest X-Ray Abnormalities.
It contains 635 pieces of images and includes normal lungs and abnormal lungs. All of the images are labeled with either “0” or “1”. “0” means that this lung is normal and “1” means abnormal. For each image, its size is 3000 × 3000, which is a large image. So we resize the images into 150 × 145, which saved a lot of time in training the model.
Figure 1 illustrates the different color between the normal and abnormal lungs. Therefore, our model specified different color and shape of the lungs to identify diseases.
To begin with, we randomly separated the CT images of lungs in the database into three different sets—the training set, the validation set and the test set and the division ratio is 6:2:2. The training set and validation set were taken into operation in training phase, and test set was used to check the accuracy (Figure 2). Then, we put the two sets into the CNN model we built.
Figure 1. (a) Image of the normal lung (b) image of the abnormal lung.
Figure 2. The training process of our model.
Convolutional neural networks (CNNs) are a class of deep neural networks. CNN is widely used and the de-facto standard in image recognition. Each CNN model consists of two main parts, one is feature learning and the other is classification. For the feature learning, we can [remove can’s] operate different numbers of layers, and there will be a Convolution filter, Activation function and a pooling. And we used Maxpooling, whose function is to combine small local region into single output with the largest value in the region, which could greatly avoid the massive calculation. Then in the classification part, we flattened and densed the matrix (Figure 3).
In our CNN model, we built two layers in our code. We added two Convolution2D functions. In the first layer, we had 25 filters with size (5, 5), and we used 50 filters in the second layer with the same size. After each Convolution2D, we used Maxpooling to reduce the calculation and then used the Activation function—“relu”. Finally, because our images are labeled with either “0” or “1”, we densed the output into two chann. After we ran our training program, the trained CNN model was able to distinguish whether the lungs in the CT image are healthy or not. Then we visualized the result we got Figure 4 and Figure 5.
Figure 3. The process of CNN model.
Figure 4. Result when study rate set as 0.01; validation accuracy 78%.
Figure 5. Result when study rate set as 0.001; validation accuracy 81%.
In Figure 4, we set the study rate as 0.01, and the validation accuracy we got was 78%. Then in Figure 5, we tried to modify different parameters, we improved validation accuracy to 81% with study rate of 0.001 at the end.
3. Results and Discussion
Although previous accuracy has been satisfactory, we have come up with other ways to improve accuracy. We find that there is some noise in the image. Most of them are from the Spinal cord and rib. So our aim is to remove the spinal cord and rib, which means extracting the lungs only. Then put the extracted lungs back into our model.
3.1. Using Fixed Coordinate
In this method, our idea is to cut out the lungs part, because the TB is mainly diagnosed from the lungs. We extract the lungs from the original image (Figure 6).
Unfortunately, this method does not solve our problem, because we do not remove the spinal cord and rib from the lungs. Additionally, due to different sizes of the lungs, sometimes we could only cut out part of the lungs using fixed coordinates (shown in Figure 7).
In Figure 7, the accuracy result is nearly 82% and increases a little bit due to the fact that the noise of shoulders is avoided.
3.2. Using Canny Algorithm
Secondly, we used CANNY algorithm   to extract the contour of the lungs before estimation.
Figure 6. Image of fixed coordinate method.
Figure 7. Result after using fixed coordinate method; accuracy 82% (little improved).
CANNY edge detection operator is a multi-level edge detection algorithm developed by John F. Canny in 1986. Generally, the purpose of edge detection is to significantly reduce the size of image data while preserving the original image attributes. At present, there are many algorithms for edge detection. Although CANNY algorithm has a long history, it can be said that it is a standard algorithm for edge detection, and it is still widely used in research.
CANNY algorithm contains 5 main steps as shown in the flowchart (Figure 8).
The extraction effect is shown in Figure 9.
The result of CANNY algorithm is not as good as that in the first model (Figure 10).
This time the accuracy of the results is about 80%, and still maintained rather low because the contour of the spinal cord and rib was extracted at the same time. Plenty of noise still remained. More importantly, the color feature of the
Figure 8. Process of CANNY algorithm.
Figure 9. Visualized image of CANNY algorithm.
Figure 10. Result of CANNY algorithm; accuracy about 80%.
image disappears after the CANNY algorithm and only the Image contour remains. Hence, the CNN model cannot distinguish the abnormal images without the color feature. We need to employ another method to replace based on color feature.
3.3. Using Mask Algorithm
We used our own mask algorithm    to crop out the lungs. To be more detailed, the algorithm is called mask method, and the 3 main steps are shown in the flowchart (Figure 11).
First, we make binarization of the images to ensure every Pixel Point is 0 or 255. To better divide and binarize the image, we draw the grey histogram to choose the suitable turning point of binarization (Figure 12).
We choose the Pixel 190 to be a turning point, and then we sign the position of each point. Next we change the 0 point to 255 in the original image and remain other points sustain. Finally we receive the new images only containing lungs without noise (Figure 13).
We used this new dataset to train the CNN model, and this time the accuracy jumps to 87% and the loss value keeps low condition (Figure 14). This method extracts the main features from the image by color brightness and the result is satisfactory, which implies the color brightness is an important factor during the image recognition.
4. Conclusions and Future Goals
Firstly, we trained the CNN model to check the lungs are normal or not with
Figure 11. Process of Mask algorithm.
Figure 12. Image of visualization of binarization; turning point Pixel 190.
Figure 13. Image of Mask algorithm.
Figure 14. Result of Mask algorithm; accuracy 87% (highest).
accuracy of 78%. Secondly, in order to improve the performance, we extracted the features in the image. More specifically, we tried three ways: 1) Using Canny. 2) Crop the lungs by fixed coordinate. 3) Using “mask”. In conclusion, we finished our initial goal, due to the fact that we ended up the accuracy of 87%.
In addition, we find the extraction method should be connected with color like “Mask” but not too much about frame like the “Canny”. Because from the result we predict the disease is related much to the color depth of CT image. Hence, to improve the accuracy we have to try more methods based on the color depth.