1.1. Research Background
In recent years, with the development of agricultural planting technology, the control technology of lemon scab, skirt rot, and pest damage often encountered by lemon has been greatly improved   . Through the breeding and utilization of disease-resistant varieties and cultivating disease-free seedlings, the pests and diseases encountered in lemon planting are also greatly reduced. The main problem in lemon sales is the mildew caused by excessive storage and the green color caused by early picking. Mildewed lemons are not edible; otherwise, they will damage health and induce diseases such as cancer  . In addition, the green lemon tastes sour, has less fructose, the organic unsaturated acid content is high, should not be eaten raw. However, due to the high content of organic unsaturated acid, green lemon can achieve beauty. Compared with the mature yellow lemon, green lemon has a special value  . Therefore, effectively and accurately identifying the storage time for too long and early picking of lemon is important practical significance .
1.2. Related Research and Problems in the Past
1.2.1. Related Research
Fruit and crop epidermis recognition is generally divided into three main research stages: traditional digital image processing, CNN, neural network, and feature processing.
Because of fast detection speed characteristics, a large amount of information, and no damage to products, machine vision technology is mostly used in fruit surface detection. The surface optical image is obtained using the optical characteristics of light reflection, projection, and diffuse projection. After being input into the computer, the image is processed with segmentation, noise removal, extraction features, data compression, coding, etc.  For example, using deep learning and SVM to identify leaf diseases ; using convolution neural network to identify many plant leaf diseases  ; using deep convolution AlexNet network model to divide tomato diseases into nine kinds ; using convolution neural network to tea sorting system .
Moreover, for machine learning, with the increase of the scale of the learning model, the parameters that the model needs to be trained gradually increase. Since the massive data needed for training a model with many parameters is difficult to obtain and large computational resources require training from scratch, transfer learning can avoid these two problems, and the knowledge training model of transfer learning has proved to be an effective method . At present, transfer learning has been used in many fields, including but not limited to text classification, image classification, artificial intelligence planning, and so on . For image classification, in agriculture, some classify and identify agricultural products through image classification , some identify crop diseases through transfer learning, and some identify the state of mature fruit ears through transfer learning   .
1.2.2. Existing Problems
For machine vision technology, however, most of the research using machine vision at home and abroad is only under static conditions, limiting the fruit detection rolling in the actual production line. Moreover, the misjudgment of fruit infarction and calyx is not effectively solved. Therefore, machine vision is difficult to be used in actual production. Moreover, for transfer learning, it has not been studied in lemon defect identification.
All in all, for many studies, the training time is less, and the model accuracy is not high enough; although some models are high, the model training time is long. Moreover, there are few specialized studies on the current discrimination of lemon defect identification problem. Therefore, the combination of Deep Learning and Transfer Learning provides a new idea to solve this problem.
1.3. Research Objectives
Compared to existing classification algorithms, this paper focuses on studying and simplifying the algorithm complexity of machine learning and classification of lemon epidermal feature recognition, strives to reduce the operator and developer’s burden, and hopes to find a new balance in accuracy and algorithmic complexity.
The lemon image needs to be processed first. Before using conventional processing methods such as gray processing and threshold separation, this paper also uses image brightness compensation called IBC. Illumination leads to two interference factors: reflection points with extremely low gray value and uneven regional brightness. IBC is to replace the pixel value of the illumination point with the average value of the peel, make a reverse brightness mask according to the regional brightness distribution, and add it to the original image to improve the uneven regional brightness.
Furthermore, Transfer Learning takes the volume base of the VGG16 Standard Model built-in by Keras as a base network for feature extraction. Then, the self-built network is merged with VGG16’s base network to obtain a lemon defect classification model. VGG16 as the base network achieves high accuracy of lemon defect recognition. Meanwhile, due to Transfer Learning, the required training parameters are greatly reduced, and a lot of parameter training time is also saved. Finally, through testing, contrast, and analysis, we find that the VGG16 CNN method achieves the highest accuracy of lemon defect classification.
2. Classification of Lemon Defects in Visual Feature Extraction and Transfer Learning
2.1. Visual Feature Extraction
2.1.1. Visual Feature Extraction Principle
It is well-known that the color of normal lemon peel is yellow. In the gray image, it can be found that green is darker than yellow, mold color is brighter than yellow, so the three kinds of lemon have clear color distinction, and the pixel value difference on the gray image is obvious. As shown in Figure 1, from left to right, lemon gray images a, b and c correspond to green, healthy, and mildew. Among them, the red circular frame selected area is green, health and mold area characteristic example. The size of the pixel value in the region increases from left to right by color extraction of green and mildew regions, which are characteristic variables.
(a) (b) (c)
Figure 1. Examples of green, healthy and mildew grayscale images. (a) Green; (b) Normal; (c) Mildew.
2.1.2. Visual Feature Extraction Methods
Visual feature extraction includes two processes: Image processing and feature extraction. In image processing, firstly, the brightness of the image is compensated, and the influence of lemon reflection and uneven illumination is reduced by making a brightness mask. Then the threshold is separated, including single threshold segmentation and multi threshold segmentation, and each part of the lemon surface is extracted according to the pixel value difference. Morphological treatments, including expansion and corrosion, are then performed. Expansion enlarges the highlighted areas of the picture, further processing the binary graph of the lemon outline to make the resulting outline more complete. Corrosion treatment shrinks the highlighted areas of the picture for speckle removal. Feature extraction is color feature extraction and area feature extraction, including extracting R, G and B three-channel information and lemon surface area information, and combining them to expand the features.
Because the camera shooting will produce the problem of reflection point and uneven illumination; we first carry on the brightness compensation  processing to the original picture; the steps are as follows:
1) Replace the pixel value of the reflection point with the average pixel value of the pericarp.
2) Through observing a large number of pictures, we found that most of the pictures of light on the lemon surface have two situations: uneven left and right and uneven up and down. By comparing the difference between the left and right pixels and the upper and lower pixels, we can generate a transverse or longitudinal gradient brightness mask, which can effectively solve uneven brightness.
3) Add the mask to the original image and get the image after brightness compensation. We use the following way to generate the mask: First, the lemon boundary is obtained by edge extraction. Secondly, the center of lemon is calculated by boundary, and the center of lemon is divided into left and right or upper and lower regions. If the average pixel value difference of left and right regions is larger than that of upper and lower regions, it indicates that the illumination is uneven. At this point, the average value of pericarp pixels is extracted as the pixel value of each column of the mask, and the transverse gradient mask is formed. Otherwise, the average pixel value of the pericarp is extracted as the pixel value of each line of the mask, and the longitudinal gradient mask is formed. Because the mask now reflects the gray features of the lemon itself, it is necessary to reverse and compress of grey levels of the mask.
Figure 2(a) is illuminated from the left, and the light is uneven from left to right. Figure 2(b) results from the addition of the transverse gradient mask and the original image. The light in Figure 3(a) is irradiated from the top, and the light is uneven up and down. Figure 3(b) results from the addition of the longitudinal gradient mask and the original image. It can be seen from the figure that after brightness compensation, the dark part of the lemon gets a higher brightness increase, while the bright part gets a lower brightness increase. The uneven illumination is improved.
Then we select the appropriate threshold to extract mold and green area by graying the picture and using threshold segmentation . Then the closed morphological operation is carried out to remove the spots. The binary map containing mold or green is obtained, as shown in Figure 4(a), Figure 4(b), and then the image and the original image are added together to obtain the color map containing only mold or green, as shown in Figure 4(c), Figure 4(d).
2.1.3. Selection of Feature Variables
Through the picture of the lemon defect, we select the characteristic variable, as shown in Table 1.
By combining these feature variables, five new feature variables are obtained, as shown in Table 2.
The data corresponding to these five feature variables is used as SVM and KNN data sets. Table 3 shows the size of the feature variables in some pictures.
Figure 2. Uneven illumination on left and right. (a) before treatment; (b) after treatment.
Figure 3. Uneven illumination up and down. (a) before treatment; (b) after treatment.
(a) (b) (c) (d)
Figure 4. Binary and colour maps of mold and green. (a) gray scale image of mold; (b) gray scale image of green; (c) original image of mold; (d) original image of green.
Table 1. Feature variables.
Table 2. Combined feature variables.
Table 3. Characteristic variable value.
2.2. Transfer Learning and VGG16
2.2.1. Lemon Defects Classification Based on Transfer Learning
In the classification of lemon defects, it is hard to extract defects features that are very complex. Therefore, we need more complex and powerful models, but powerful models usually take much time to train. Using migration learning knowledge and applying powerful models to lemon defect classification can quickly help us solve problems. Transfer learning is to transfer the trained model parameters to a new model to optimize the training of the new model. Since most of the data and tasks are related, we can transfer the parameters of the pretrained model to the new model in some way through transfer learning, thereby speeding up and optimizing the learning efficiency of the model . Among them, there are three ways to realize Transfer Learning: Direct Transfer, Extraction of Feature Vectors, and Fine-tune.
2.2.2. Build a Model Based on VGG16
Figure 5 shows the model of our proposed method. Our proposed model has three steps: splitting the VGG16 standard network, transferring the first 15 layer parameters of VGG16, and constructing a classification network adapted to the problem of paper. First, remove the first 15 layers of the Keras built-in VGG16 standard model and keep the weights of the first 15 layers of the VGG16 CNN unchanged; that is, do not train during the training process. Then, design a Flatten Layer, a fully connected layer with 256 nodes and ReLU. Set the Dropout value to 0.5. Next, because of the need to divide lemons into three categories. Therefore, a fully connected layer with three nodes and Softmax is designed. Finally, combine the designed model with the first 15 layers of the Keras built-in VGG16 standard model to obtain the final lemon defect, classification model.
3. Experimental Process
3.1. Data Processing
3.1.1. Data Collection and Classification Standards
We obtain the original data set provided by the Maciej Adamiak through the SOFTWAREMILL website . We selected and downloaded 1847 lemon pictures and marked them as normal, green, and mildew. There are 1294 normal image data, 149 green image data, and 404 mildew image data. As shown in Figure 6, there are three corresponding image examples, in which the a is normal, the b is mildew, and the c is green.
3.1.2. Data Augmentation
Due to the small amount of data obtained, and when the amount of data is small, over-fitting is often prone to occur when training the model, so it is necessary to make small changes to the image data. For the computer, the image added after the change is equivalent to the image data, so that more data can be obtained and over-fitting can be prevented. The data augmentation is to expand the image
Figure 5. The overall process of building a model based on VGG16. (a). Separate the standard VGG16 network. (b). Split the first 15 layers of VGG16 and keep the parameters unchanged during the training. (c). Build a network to solve the problems studied in this paper, including flattening and full connection layers, and set Dropout values accordingly.
(a) (b) (c)
Figure 6. Three types of lemon images. (a) Healthy; (b) Mildew; (c) Green.
data set through small changes such as flips, translations, and rotations. Thus increasing the training amount and increasing the noise data to prevent the model over-fitting in model training. At the same time, it can also enhance the generalization ability and robustness of the data. Based on the fact that the position of lemon in the picture of the data set is not fixed and the shooting angle of lemon is different, we mainly adopt the flip and translation in the geometric augmentation of data.
The flipping of data selects horizontal and vertical flipping images, which have proved effective on data sets such as ImageNet and CIFAR-10 . The translation moves the image along the horizontal or vertical axis of the image, which can avoid the position deviation in the image data. When the original image is translated in one direction, the remaining space is filled with a constant value of 0, which preserves the enhanced spatial dimension of the image.
3.1.3. Experimental Preprocessing
In the experiment, the features in the image are extracted by visual feature extraction, and then classified by KNN and SVM respectively to calculate the accuracy of KNN test set and SVM test set and training set. For the two constructed convolutional neural networks, firstly, the image is grayed, and then the image is compressed to 256 × 256. In addition, for the VGG16-based model, it is necessary to convert the gray image into three-channel color image, and then the image data set is preprocessed by data enhancement, image brightness compensation, data enhancement and image brightness compensation respectively, and finally calculate the accuracy of the test set and training set.
Because the KNN has no fitting function process, the training time is almost zero. When dealing with the multi-classification problem, it has a more accurate identification of rare data. So, the KNN is used to construct the classification model firstly. The number of Neighbor “K” is the core parameter of the KNN. If the K value is small, the classification is significantly affected by the error. When the K value is too large, the KNN loses the significance of classification.
To make the data set partition have great randomness. To enable greater data randomness, we introduce the split function in sklearn. Under the condition of random state = 1, the train data and test data are divided according to the ratio of 3:1 to ensure that each test can ensure the randomness of data segmentation. After that, the K value is gradually determined by using multiple two classification methods. When the test k value is in the range of 50 to 60, it is found that the minimum error reached K = 54, as shown in Table 4. Finally, the accuracy of the model is determined by using 10-fold cross-validation.
Since solving quadratic programming will involve calculating the m order matrix, the processing of a large number of samples is much inferior to that of the same algorithm, so we use SVM to process lemon data . The most important influence on the model’s accuracy is the kernel function, gamma value, and penalty parameter C. The commonly used kernel functions include linear, poly and RBF, and so on. Following the principle of parameter selection, we select RBF for kernel functions by consulting data.
Table 4. Accuracy corresponding to K values.
Like KNN, data is divided into 75% of the training set, and 25% of the test set. Through the experiment, the Gamma value of the best accuracy is 0.01, and the C value is 10. The validation set data and part of the training set data are exchanged alternately by cross-checking method , and the highest accuracy is selected.
3.2. Benchmark Model of the Model-Based on VGG16
To compare the accuracy of the VGG16-based model with the conventional CNN network model. Divide the data into 70% training set, 10% validation set, and 20% testing set, and then conduct experiments. The validation set is used to select the best hyper-parameters. Firstly, we constructed a conventional CNN network model . For more efficient model training models, for convolutional neural network classes, the image is first under graying processing and then compressed to the size of 256 × 256.
Through the experiment, we trained different parameters, as shown in Table 5, the optimal performance network has the following structure:
Step 1: Firstly, the convolutional layer with the number of two convolution kernels 32 and the size (3, 3) is set, the activation function is set as ReLU, the max-pooling layer is (2, 2), and the drop-out value is set as 0.25.
Step 2: Secondly, the convolutional layer with the number of two convolution kernels 64 and the size (3, 3) is set, the activation function is set as ReLU, the max-pooling layer is (2, 2), and the drop-out value is set to be 0.25.
Step 3: Lastly, set a flatten layer, set a 512 full connection layer, set an activation function ReLU, set the drop-out value to 0.5, set the second full connection layer to the number of classification, which is 3, and the activation function to softmax.
For the optimizer, we select the RMSprop. RMSprop is an improved adaptive gradient descent algorithm, while the AdaGrad algorithm may be difficult to find a useful solution in late iteration due to too low a learning rate.
The calculation budget of 720 epoch is given in the following figure, and the small-batch gradient descent training model with batch size 32 is used. Then, the convergence of loss values, the accuracy of the training set, and the validation set are shown in Figure 7.
Table 5. Hyper-parameter space of the conventional CNN model.
Figure 7. Convergence of conventional CNN models constructed. (a) Accuracy of validation and training sets; (b) Convergence of loss function.
When we increase the epoch to 720, the loss function of the training set converges to 0, but at this time, there is overfitting. Moreover, it can be found that when the epoch is greater than 360, the model will be overfitting. Therefore, when testing the model, the epoch should be set to 360.
3.3. Train the Model Based on VGG16
Since VGG16 can only recognize color images with dimensions greater than 48 × 48, in this paper, the image is preprocessed with gray level and 256 × 256 data compression. Therefore, before putting the data set into the model for training, validating, and testing, converting the data set of the picture properly and transforming the picture data set into 256 × 256 size three-channel color pictures. Turn the category into single-thermal coding, expand the last dimension of the MNIST training set, validation set, and testing set, then normalize the data.
Moreover, like conventional CNN, firstly, divide the data into 70% training set, 10% validation set, and 20% testing set, and then conduct experiments. We choose the optimal model parameters through experiments, as shown in Table 6.
The analysis in Figure 8 shows that when the epoch is about 360, the model converges and tends to 0.
Therefore, when testing the model, the epoch is chosen to be 360.
Table 6. Hyper-parameter space of the model-based on VGG16.
Figure 8. Convergence of model-based on VGG16 construction. (a) Accuracy of validation and training sets; (b) Convergence of loss function.
3.4. Results and Analysis
3.4.1. Experimental Results
After visual feature extraction and inputting data, we obtain the highest accuracy of the SVM model training set and test set and the test set accuracy of KNN. For the benchmark model of CNN, a conventional CNN model is trained under data enhancement, illumination compensation, illumination compensation, and data augmentation, respectively, and the highest model accuracy is selected under multiple traversals. Using CNN same method as the traditional four conditions, the model based on VGG16 is trained, and the highest model accuracy is selected under multiple traversals. The final experimental results are shown in Table 7.
Among them, CNN represents the conventional CNN, VGG16 for the model based on VGG16, and Aug represents Data Augmentation. IBC stands for image brightness compensation, and the meaning is the brightness compensation for the original color picture.
3.4.2. Results Analysis
1) Data Augmentation
In Table 8, it is found that data augmentation decreases the accuracy of the model based on VGG16, whether the image is preprocessed with photometric compensation in terms of the traditional CNN model and the model based on VGG16. Moreover, in Table 8, the training set’s accuracy has decreased much, indicating that the model is difficult to converge under the prescribed number of iterations, and the data augmentation method of translation and rotation leads to the increased difficulty of model recognition and the reduced generalization ability. Increasing the number of iterations can solve this problem effectively.
2) Brightness Compensation
Through Table 8, it can be found that the brightness compensation can improve the accuracy of the model without data augmentation, whether it is the traditional CNN model or the model based on the VGG16, and the fitting degree of the model is very good. However, if data augmentation is needed, brightness compensation harms the accuracy of the model.
Table 7. Accuracy under models and different data processing.
Table 8. Model accuracy with or without data augumentation.
3) Overall Analysis
It can be found that the accuracy of the classification model is not high in the case of visual feature extraction. For the benchmark model of conventional CNN, the accuracy of the model is higher after using brightness compensation. The accuracy of the test set is 98.27%, and the accuracy of the training set is 90.67%. When the transfer learning is adopted, the accuracy of the model is high, the accuracy of the training set is 100%, and the test set’s accuracy is 95.44%. Hence, we need to compensate the image in advance for the image classification problem in this paper and build the model based on VGG16. Finally, the best classification accuracy can be achieved.
4. Summary and Future
Based on the existing technology and experiment, we can build VGG16 to realize the high accuracy of lemon recognition by using brightness compensation and classification model. In the future, the existing methods will be further improved, and new methods will be used to improve the accuracy and generalization ability of lemon recognition:
1) Optimizing brightness compensation method to eliminate the problem of uneven illumination and reflection point more accurately.
2) Try more data augmentation combinations to find optimal combinations.
3) Upgrade the device and use more iterations for the model.
4) Use cross-validation to represent model performance more accurately.
5) Continuous optimization of the model allows the model to identify more lemon explicit features.
6) The iteration of various convolution neural networks will be used as a follow-up research direction to improve accuracy.
 Qin, F., Liu, D., Sun, B., Ruan, L., Ma, Z. and Wang, H. (2017) Image Recognition of 4 Alfalfa Leaf Diseases Based on Deep Learning and Support Vector Machine. Journal of China Agricultural University, 22, 123-133.
 Sun, J., Tan, W., Mao, H., Wu, X., Chen, Y. and Wang, L. (2017) Identification of Various Plant Leaf Diseases Based on Improved Convolutional Neural Network. Journal of Agricultural Engineering, 33, 209-215.
 Mohanty, S., Hughes, D.P. and Salathé, M. (2016) Using Deep Learning for Image-Based Plant Disease Detection. Frontiers in Plant Science, 7, Article No. 1419.
 Brahimi, M., Boukhalfa, K. and Moussaoui, A. (2017) A Deep Learning for Tomato Diseases: Classification and Symptoms Visualization. Applied Artificial Intelligence, 31, 299-315.
 Suh, H.K., IJsselmuiden, J., Willem Hofstee, J. and Van Henten, E.J. (2018) Transfer Learning for the Classification of Sugar Beet and Volunteer Potato under Field Conditions. Biosystems Engineering, 174, 50-65.
 Barbedo, J.G.A. (2018) Impact of Dataset Size and Variety on the Effectiveness of Deep Learning and Transfer Learning for Plant Disease Classification. Computers and Electronics in Agriculture, 153, 46-53.
 Hussain Hassan, N.M. and Nashat, A. (2018) New Effective Techniques for Automatic Detection and Classification of External Olive Fruits Defects Based on Image Processing Techniques. Multidimensional Systems and Signal Processing, 30, 571-589.
 Wang, J. (2019) Research on Crop Disease and Weed Image Recognition Based on Convolutional Neural Network and Migration Learning. Master’s Thesis, University of Science and Technology of China, Hefei, 55-57.