Soil total nitrogen, as one of the essential nutrients for crop growth, has a significant effect on the growth and development of crops. The traditional chemical determination method of soil nitrogen is time-consuming and laborious. The near-earth hyperspectral technique developed rapidly in recent years with its higher spectral resolution provides a possibility for rapid and real-time estimation of soil total nitrogen content, which has important practical significance for scientific and rational fertilization of soil. At present, domestic and foreign scholars have made some research results on estimating soil nitrogen content. Reeves et al.  estimated the TN of the soil using near-infrared reflectance spectroscopy. Galvao et al.  estimated the soil composition and the highly relevant spectral bands using regression equations. Dalal et al.  measured the spectral reflectance of the soil by near-infrared spectroscopy. They found that the spectral band had a certain relationship with soil nitrogen content in the range of 1700 - 2100 nm, and the model was established by the method of multiple stepwise linear regression. Zhao et al.  used multivariate stepwise linear regression method to construct the prediction model of TN content of loess soil samples with a correlation coefficient of 0.94. Lu et al.  used the first order derivative of logarithm of reflectance and normalized spectral index to establish the prediction model of TN content of black soil in Northeast China. The results showed that the normalized spectral index based on 550 nm and 450 nm could estimate the TN content of the soil well. Zhang et al.  used the soil samples of five major types in the middle and eastern regions of China to analyze the relationship between TN content and spectral reflectance, and the estimation model of soil TN content based on partial least squares, BP neural network and feature spectrum index is constructed. It is found that in the band range of 500 - 900 nm and 1350 - 1490 nm, the prediction accuracy of the model between the first-order differential of reflectivity and the TN content of soil is higher. Xu et al.  based on the analysis of physicochemical properties and spectral reflectance data of soil samples. A prediction model of TN content in purple soil was established by using partial least squares regression method, which proved that it was feasible to predict the TN content of purple soil by hyperspectral spectroscopy. The above studies have proved that it is feasible to estimate the TN content of different types of soil by spectral analysis. However, the applicability of the hyperspectral inversion model of soil total nitrogen content in different regions is different. Therefore, it is necessary to improve and supplement the various types of soil in a timely manner.
This study was based on the brown soil of apple orchard in Qixia County, Yantai City, Shandong province. The spectral reflectance data of soil samples were obtained by using ASD FieldSpec 3 under controllable indoor conditions. At the same time, the TN content of soil samples was analyzed. The soil sample data was pretreated. The law of the change of TN content and its correlation were analyzed. The hyperspectral estimation model of soil total nitrogen content was established.
2. Materials and Methods
2.1. Research Areas Overview
The soil samples were collected in Qixia County, Yantai City, Shandong Province (120˚33'E to 121˚15'E, 37˚05'N to 37˚32'N). Qixia County is located in the center of Jiaodong Peninsula. Warm temperate monsoon semi-humid climate, years of average temperature 11.4˚C, the average annual rainfall of 640 - 846 mm. It is a mountainous hilly terrain. Orchard soil is mostly brown soil, thin and soft, acidic, rich mineral elements and good permeability.
2.2. Orchard Soil Sample Collection and Preparation
A total of 23 orchards in Qixia were sampled and collected on October 20 - 23, 2010. A total of 92 brown soil samples were collected. We randomly selected 4 trees at each sampling point. The soil samples were collected in the east, west, south and north directions below each fruit tree. The depth of the collected soil samples is 0 - 20 cm. After mixing the soil samples, we use the quartation to obtain the final sample. The location of the sampling area is shown in Figure 1. All soil samples were air dried, then picked out gravel and the remains of animals and plants. After grinding, pass through the 1 mm hole screen. One soil sample was divided into two parts, one for the measurement of hyperspectral reflectance and the other for chemical determination of soil TN.
2.3. Spectral and Total Nitrogen Content of Orchard Soil
2.3.1. Orchard Soil Spectral Determination
The hyperspectral reflectance of the soil is measured by ASD FieldSpec3. The spectral range of the spectrometer is 350 - 2500 nm. The spectral interval is 1.4 nm in the range of 350 - 1000 nm, and the spectral resolution is 3 nm. The 1000 - 2500 nm range is 2 nm, the spectral resolution is 10 nm. The resampling interval is 1 nm and the output band number is 2151. The treated soil sample is placed in a vessel with a diameter of 15 cm and a depth of 2 cm. The soil surface is flattened after filling. Spectral measurements were carried out under the same
Figure 1. Location distribution of sampling area.
conditions as the darkroom. The light source of the instrument uses a halogen lamp with a power of 50 W. The light source is 30 cm away from the soil sample center. The optical fiber probe is fixed on the tripod, the field of view of the probe is 25˚ and the distance is 15 cm from the soil surface. At the time of measurement, the vessel was rotated three times, and the rotation angle was about 90˚ each time. Then the soil samples were obtained in four directions. After averaging, the reflectance data of the soil samples were obtained.
2.3.2. Orchard Soil Total Nitrogen Content Determination
The soil TN content was determined by Kjeldahl method. In the presence of a catalyst, the soil samples were digested with concentrated sulfuric acid to convert the organic nitrogen into an inorganic ammonium salt. The ammonium salt was converted to ammonia under alkaline conditions, distilled with water vapor and absorbed by excess acid. The soil TN content was calculated by the standard alkaline titration.
Among them, is the standard acid solution concentration (0.01 mol/L). is the standard liquid volume of the acid used for titration(mL). is the standard acid solution volume for titration blank (mL). 0.014 is the molar mass of nitrogen (kg/mol). is the sample volume (g). is the multiple of the fraction, that is the volume of the decontamination liquid volume/absorb the measured volume.
2.4. Data Preprocessing
2.4.1. Hyperspectral Data Processing
The pretreatment of soil spectral data is a necessary and effective means to improve the precision of hyperspectral modeling   . It can effectively eliminate the influence of the soil type (chemical form, physical form, such as particle size) and environment (temperature, humidity, etc.), thus highlighting the correlation between spectral reflectance and nutrient content  . Savitzky-Golay (SG) smoothing can effectively remove the baseline drift and tilt noise. Multiplicative scatter correction (MSC) can eliminate the near-infrared spectra of the same batch due to the nonuniformity of the sample particles during the diffuse reflection process. It has been proved that the effect of the model estimation can be further improved by using SG smoothing and MSC treatment  . Therefore, this study used the Savitzky-Golay (SG) smoothing method. The four transformations of logarithmic transformation, first order differential of logarithm, multiple scattering correction and multiple scattering correction first order differential are carried out on the basis of smoothing.
2.4.2. Savitzky-Golay (SG) Smoothing
Select a smooth window with a width of (2 w + 1). Calculate the spectral mean of each w wavelength point from the center wavelength point a and a point in the window, and the is substituted for the measured value at the wavelength point a. Change the value of a to move the window in turn. Until the smoothing of all wavelength points is completed. Polynomial least squares fitting is used to multiply the data in the moving window to achieve smooth purposes  . The value of wavelength point a after smoothing is:
In the formula, the is a smoothing coefficient, which is fitted by the least square method. is the normalized factor.
2.4.3. Multiple Scatter Correction (MSC)
Multiple scatter correction (MSC) is mainly to eliminate the scattering effect of uneven particle size distribution and particle sizes. The attributes of the MSC algorithm are the same as the standardization, which is based on the spectral array of a group of samples. The average spectra of all the NIR spectra were calculated first, then the average spectra were used as standard spectra. The near-infrared spectra of each sample were a-element linear regression with the standard spectra. The linear translation (regression constant) and the tilt offset (regression coefficient) of each spectrum relative to the standard spectra are obtained. The baseline relative tilt is subtracted from the original spectra of each sample and divided by the regression coefficient correction spectra.
Calculate the average spectra,
Using the mean spectra to calculate the regression coefficients,
Using regression coefficients to calculate the corrected spectra of MSC,
is the spectral data of SG smoothing processing.
is the mean spectra.
are regression coefficients.
2.4.4. First Order Differential
The first order differential of the reflectance spectrum is obtained by differential technique. The formula is as follows:
In the formula: is the wavelength of the wavelength. is the first order differential spectrum of . is the interval of wavelength to .
After logarithmic transformation of soil reflectivity, the spectral difference of visible area can be enhanced. The stochastic factors caused by the changes of illumination condition and topography are reduced. The formula is as follows:
In the formula: is the wavelength of each band. is the logarithmic transformation of the soil reflectivity.
2.5. Model Construction and Verification
The multiple linear stepwise regression is used to filter sensitive bands, select or remove variables according to the set F value, and finally establish the best model with only a few variables  . The multiple linear stepwise regression method was used to select and analyze the sensitive wavelength based on the analysis of the relationship between the spectral data after pretreatment and the TN content of soil. 92 soil samples were randomly divided into 69 modeling samples and 23 validation samples. The model was established by using Random Forest and Support Vector Machine.
2.5.1. Random Forest
Random Forest (RF) is a kind of algorithm based on classification tree, which improves the accuracy of prediction without significant improvement. RF can explain the effect of some independent variables on the variable Y. If the dependent variable Y has n observations, there are k independent variables associated with it. In the construction of the classification tree, the random forest will randomly re-select the n observations in the original data. Some of which are selected multiple times, some have not been selected. This is Bootstrap re-sampling method. At the same time, RF randomly selected partial variables from k independent variables to determine the classification tree nodes. In this way, each time you build a classification tree may be different. In general, RF randomly generates hundreds of to thousands of classification trees and then selects the tree with the highest degree of repeat as the final result  .
2.5.2. Support Vector Machine
Support Vector machines (support vector machine, SVM) is the first proposed by Corinna Cortes and Vapnik in 1995. It is based on the theory of VC dimension of statistical learning theory and the minimum principle of structural risk, as well as seek the best compromise between the complexity of the model and the learning ability of the limited sample information in order to obtain the best promotion ability. It has many unique advantages in solving small sample, nonlinear and high dimensional pattern recognition. It can be applied to other machine learning problems such as function fitting. Support Vector Machine regression is a better way to realize the idea of structural risk minimization. It has machine learning theory and technology, and the learning algorithm of neural network is included in the field of nuclear technology   . The Support Vector Machine regression method is based on the principle of structural risk minimization, which overcomes the traditional learning method and has become a local minimum problem. It has a strong generalization ability, and uses the kernel function method to increase the computational complexity. It can also effectively overcome the dimension of the disaster problem, and widely used in statistical classification and regression analysis of the method. The mathematical form is simple and suitable for small sample analysis   .
2.5.3. Models Test
The estimation effect of the model is tested by coefficients of determination (R2), root mean square error (RMSE) and average relative error (RE).
is the measured value. is the predicted value. is the mean of the measured value. is the average of the predicted value. n is the number of samples. The larger the R2, the more stable the model. The smaller the RMSE and the RE, the higher the estimation accuracy of the prediction model  .
3. Results and Analysis
3.1. Statistical Analysis of Soil Total Nitrogen Content in Orchard
Table 1 shows the basic statistical results of the total nitrogen content of soil samples. As can be seen from Table 1, the statistical indicators of soil samples used for modeling and validation are consistent with the overall sample, which can reduce the effect of modeling results due to uneven distribution of samples.
3.2. Spectral Characteristics of Orchard Soil
Figure 2 shows the spectral reflectance curves of the orchard soil obtained by experimental measurements. As shown in Figure 2, the reflectance of the soil spectrum curve in the range of 350 - 1000 nm is low, the reflectivity increases rapidly with the increase of the wavelength and the curve is steep. The curve shows a slow rising trend in the range of 1000 - 1900 nm. The reflectance value increased slowly and finally reached the highest. The spectral curve of the 1900 - 2100 nm band increased rapidly with the increase of the wavelength. The reflectivity in the 2100 - 2500 nm band gradually decreased. There are three significant
Figure 2. Spectral reflectance curve of soil.
Table 1. Characteristics of soil total nitrogen content of sample statistic.
absorption valleys at 1400 nm, 1920 nm and 2200 nm, showing typical soil spectrum characteristics.
3.3. Correlation Analysis and Sensitive Wavelength Screening
The spectral reflectance data obtained by SG smoothing of the original spectra are analyzed by the correlation analysis with the total nitrogen content of the soil. The results are shown in Figure 3 after logarithmic transformation, logarithmic differential transformation, multivariate scattering correction (MSC) and multivariate scattering correction first order differential four kinds of transformations
As shown in Figure 3(a), the correlation between the soil reflectance spectrum after logarithmic transformation and the total nitrogen content of the soil is low. The correlation coefficient between soil total nitrogen and spectrum was improved remarkably after the first-order differential treatment.
Figure 3(c) shows the correlation between the reflectivity after multiple scattering correction and the content of soil total nitrogen. It can be seen from Figure 3(c) that after the multivariate scattering correction, the effect of the external noise on the spectrum is effectively eliminated. The correlation between the total nitrogen content and the spectral reflectance is increased. As shown in
(a) (b) (c) (d)
Figure 3. Correlation coefficients of total nitrogen content and reflectance of different transformation modes with wavelength. (a) Logarithm; (b) Logarithmic first-order differential; (c) MSC; (d) MSC first-order differential.
Figure 3(d), the correlation coefficients of soil total nitrogen and spectral reflectance were improved after first order differential treatment,
It can be seen that the differential spectra can provide higher resolution and clearer spectral contour transformation than the original spectra. It can also eliminate the influence of background interference and improve the correlation between the soil total nitrogen content and spectral reflectance. Therefore, we use multivariate scattering to correct the first-order differential transformation form compared with the correlation of four kinds of transformation modes in Figure 3. The sensitive wavelength of soil total nitrogen is screened by multiple linear stepwise regression based on the correlation with soil total nitrogen content.
The reflectivity of the first order differential is corrected by the spectral multivariate scattering and the TN content of the soil is the dependent variable. The significant level of setting the selected variables is 0.05; the level of elimination variable is 0.10. Then, the multivariate linear stepwise regression analysis is obtained. The sensitivity of the total nitrogen were 956 nm, 995 nm, 1020 nm, 1410 nm, 1659 nm and 2020 nm.
3.4. Construction and Testing of Hyperspectral Estimation Model for Total Nitrogen Content in Orchard Soil
3.4.1. Random Forest Regression Model
The RF regression model of soil total nitrogen content was established by selecting the screened sensitive wavelength as the independent variable of the prediction model of soil total nitrogen content. Select Ntrees = 300, the training sample proportion is 50%. The node at the variable number is 3. The results of modeling samples of stochastic forest regression model and validation samples are shown in Figure 4.
3.4.2. Support Vector Machine Model
Selecting the filtered sensitive wavelength as the condition attribute of support vector machine regression and the soil total nitrogen content as the decision attribute. The support vector machine type finally confirmed for epsilon-svr and the kernel function type is RBF through the parameter optimization, the regression modelling and the verification. The model parameter is shown in Table 2. SVM model modeling sample fitting results and validating samples are shown in Figure 5.
Figure 4. Random forest regression modele validation results and estimation validation results.
Figure 5. Support Vector Machine model validation results and estimating sample validation results.
Table 2. Parameters of support vector machine regression model.
Table 3. The soil total nitrogen content model validation effect.
3.4.3. Test and Analysis of the Estimation Model
The estimation model was tested by independent sample data in order to analyze the prediction ability of soil TN content by RF model and SVM model. The test results are shown in Table 3. The model prediction coefficients (R2) of the two methods were 0.7058 and 0.7754 respectively, the root mean square errors (RMSE) were 0.0251 and 0.0221 and the relative errors (re) were 0.1698 and 0.1525 respectively. In contrast, we can see that the SVM model is more accurate after verification.
The correlation between spectral reflectance and soil TN content was improved by using the first order differential treatment of MSC and SG smoothing spectral reflectance data at the same time. A series of sensitive wavelengths (956 nm, 995 nm, 1020 nm, 1410 nm, 1659 nm and 2020 nm) are screened out by the multivariate linear stepwise regression analysis, which provides an important basis for improving the stability and reliability of the model.
Comparing the two models of RF and SVM, the model predictive coefficients (R2) were 0.7058 and 0.7754, RMSE were 0.0251 and 0.0221 and the relative errors (RE) were 0.1698 and 0.1525 respectively. The results show that the SVM regression model is more feasible than the RF regression model in estimating the soil total nitrogen content. In contrast, we can see that the model established by SVM is more accurate after verification.
This paper was supported by the National Nature Science Foundation of China (41671346, 41271369), Funds of Shandong “Double Tops” Program (SYL2017XTTD02) and agriculture big data project of Shandong Agricultural University (75016).