The validation of prediction models from spectral data can contribute to better develop precision agriculture with ability of the spectroscopy to provide more efficiently analytical parameters of the soil on large datasets  . Conventional laboratory analyses of soil properties are expensive  . The processing time requires also intensive labour to generate the necessary data  . However, spatial characterization of soil variability at a fine scale is often necessary for a sustainable management of the soil cover  . Spatialization of soil properties is also an important factor for the monitoring of soil moisture, soil fertility and soil acidification  . This needs to have detailed information on soils with alternative methods, at lower cost, which is a real challenge in the developing countries where the availability of analytical equipment of soils remains widely in sufficient. Furthermore, organic matter is recognized as good indicators of the quality of the soil in the Sahelian agrosystem in Senegal   . In this fact, calibration of prediction models on these agro-pedological variables becomes an issue of sustainable development knowing that agricultural production plays a major role in food security and in performing economies in sub-Saharan African countries  . Also, with the global warming, the promotion of management strategies which allow the storage of carbon in the soil and reduce emissions of carbon dioxide in the atmosphere is required  . In the international context, mathematical and statistical methods of prediction are increasingly tested in the soil properties analysis protocols in relation to spectroscopy data  -  . The possibility offered by spectroscopy to generate reflectance and luminance spectra in different wavelengths 250 - 400 nm (ultraviolet; UV), 400 - 700 nm (visible; VIS), 700 - 2500 nm (near infrared; NIR), 2500 - 25,000 nm (med infrared, MIR) allows an extraction of useful information about soil components at lower cost   . Hence, the interest of pursuing research in spectroscopy was to implement more accurate and reproducible model estimation of soil properties. For the exploration of these spectral data, pre-processing functions were carried out to determine the most relevant spectral wavelengths for estimating soil properties  . The applications of statistical model associated with other processing functions have given good results in the analysis of soil properties through spectroscopy   . The continuum removal (CR) is one of the examples of pre-treatments which allowsto isolate particular absorption features in diffuse reflectance spectra on the soils  . After isolation, these absorption wavelengths were removed in the explanatory variables of the model in order to minimize errors prediction. The CR was calibrated with the PLSR (Partial Least Square Regression) model to evaluate the level of accuracy in predicting soil organic carbon from the spectral data. The application modalities of continuum removal raised some scientific questions. First of all, few studies have demonstrated the implementation of CR calculation method in spectroscopy of soil data  . Secondly, its real effects in the estimation of physical, chemical and biological properties of soils are not sufficiently focused. The objective of this study was then to better understand the function of the continuum removal (CR) and to evaluate this effect in terms of accuracy level of the prediction for the soil organic carbon from spectral soil data.
2. Material and Method
2.1. Study Area
The study area is located in the lower delta of the Senegal River. It corresponds to the30-ha agricultural farm of the University of Saint Louis, where a tributary of the Senegal River (the Djeuss) allows development of farming activities. The climate is a sub-Canarian to Sahelian with a short rainy season between July to October (Table 1) and a dry season that lasts from November to June. The natural vegetation cover is a shrub steppe comprising mainly Acacia raddiana, Balanites aegyptiaca, Prosopis juliflora and Euphorbia balsamifera. The study area is an experimental site of market garden, horticultural and rainfed crops. However, the need to promote precision agriculture required to correct the lack of information on the spatial variability of the physical and chemical soil properties. So, we have performed a stratified sampling point of the soil following a regular grid of 30 m by using Landsat imagery and Google Earth. A total of 216 sampling points, meaning 3 - 4 points for each plot were selected. The geographical coordinates of each point are located by GPS survey and referenced in a geographic information system (Figure 1). Soil profiles were sampled with auger in the following depths: 0 - 10 cm; 10 - 20 cm; 20 - 40 cm; 40 - 60 cm and 60 - 80 cm. For each depth, a composite sample is created by mixing 3 primary samples; with 1080 soil samples collected in the study area, 432 were analysed for the biochemical and chemical
Table 1. Monthly evolution of the rainfall over the last five years (2010-2015).
Source: Agence Nationale de l’Aviation Civile et de la Météorologie (ANACIM, 2015).
Figure 1. Location of the study area and sampling point of soil properties.
properties in the Africa Rice laboratory. The validation of a prediction model from the Vis-NIR spectroscopy data will be able to estimate the biochemical and chemical properties of the soil on the 648 remaining samples.
2.2. Spectral Data
The reflectance of soil samples was measured with the spectroradiometer of the ASD Company (Analytical Spectral Devices, CO) at the Institute of Research for Development (IRD, Center of Dakar). Samplesof about 10 mg amount of soil were put into Petri dishes. Soil spectra were detected over the wavelength ranging from 350 nm (UV) to2500 nm (NIR). The spectral reflectance was first performed with measurements at the absolute reflectance (baseline) with a white Spectralon panel. For the soil samples, measurements were repeated three times and the logarithmic values of reflectance were stored in auto save document (.asd file) and were converted to ASCII files with InDiCo Pro software. With Excel software, the matrix of input variables for the model prediction was built from the average of 3 measurements for each soil sample.
2.3. The Continuum Removal
The continuum removal (CR) was a pre-treatment function used in spectroscopy to improve the estimation of soil properties  . It allowed to isolate a particular absorption feature for analysis of a spectrum and represented the absorption due to a different process in a specific mineral or possibly absorption from a different mineral in a multimineralic surface   . We have computed the continuum removal (Equation (1)) and the continuum curve (Equation (2)) in the remote sensing software of Envi®4.7. The matrix of the full spectrum (FS) was before transformed in txt format. The spectral library builder optionof Envi®4.7 allowed to calculate the reflectance value of the continuum removal (CR) from the reflectance value of the full spectrum (FS). Afterwards, the transformation of these two matrixes (FS and CR) into spectral band allowed to compute the reflectance value of the continuum curve with the BandMath function of Envi®4.7 software.
where CR = Continuum Removal; FS: Full Spectrum; CC: Continuum Curve.
2.4. The PLSR Model
The partial least square regression (PLSR) was used to estimate soil organic carbon. The comparison of different data mining algorithms for prediction of soil properties from the spectral reflectance data showed regression performance via Support Vector Machine(0.92%, RMSE) followed respectively by the partial least square regression (0.96%, RMSE) and the Stochastic gradient boosting (1.02%, RMSE)  . One of the advantages of PLSR compared to other chemometric methods like principal component analysis is the possibility to interpret the first few latent variables (LV), because they show the correlations between the property values and the spectral features  . The PLSR enables to understand and describe the often complex relationship between two types of variables X and Y  ; X often composed of several variables, is called explanatory variables and Y represents the response variable  . The PLSR model was based on a linear relationship between soil properties and spectral data (Equation (3)) that were characterized by the complexity and the richness of information they contain  . Soil samples were taken from the lower delta of the Senegal River. The PLSR (partial least square regression) was performed in R 3.1.2 software  to estimate the soil organic carbon.
With the PLS model, the database was divided into two separate sets for calibration and validation. A recursive split with the principal component analysis (PCA) method allows us to select the 186 soil samples and which was tested the PLSR model. The PCA applied to the full spectrum was also used to select the 2/3 of the dataset (124 soil samples) used in calibration; the 1/3 remaining (62 soil samples) was used to validation (Figure 2).
where Y = the estimated value; bi: the coefficients of the model to the wavelength i; Xi: the reflectance at the wavelength i; ε0: the residual error; P: value of reflectance spectra.
Figure 2. The spectra validation in supplementary individual in the factorial plane of PCA.
The validation data is put on supplementary individuals in the factorial plane of the PCA (Figure 2). We have taken into account the variability of the individuals knowing that the result of PCA with addition to the first and second axes were superior to 80%.
The PLS model transforms the explanatory variables into latent variables (called components) through a linear combination of the least correlated variables. The leave one out cross-validation method allows to choose the optimal number of components whose lowest RMSE (root mean square error) was selected  . The R2 (determination coefficient) is another index that measures the performance of the PLSR model. It refers to the part of inertia explained by the model on the total variability. With the CR, the ultraviolet (350 - 429 nm) and the near infrared (2289 to 2500 nm) wavelengths which values of reflectance were equal to1were removed from the spectrum in the prediction model. The model was turned on both with data of the full spectrum (350 nm to 2500 nm) and that those of the continuum removal (430 - 2490 nm).
2.5. The Analytical Data
The PLSR model was performed on186 soil samples selected according to their variability on the factorial plane of the PCA. The box plots of the soil organic carbon computed with R 3.1.2 software shows a range between 0.07% to 0.39%; the average organic carbon content was around to 0.20% (Figure 3).
Following the application of the continuum function, the wavelengths with reflectance value equal to 1 were removed from the matrix of the explanatory variable. The absorption peaks of the organic carbon (OC) were then better highlighted with the evolution of the regression coefficients of the continuum removal in comparison with the full spectrum (Figure 4 and Figure 5). Furthermore, the comparative analysis of reflectance values of CR in three soil samples showed higher peak reflectance on the sample that had a high level of soil carbon. The 2c-1 sample with a carbon content of 0.38%
Figure 3. The box plots of the range of the soil organic carbon content of the data set.
Figure 4. Changes in regression coefficients with the Full Spectrum (FS).
Figure 5. Changes in regression coefficients with the Continuum Removal (CR).
showed absorption peaks of 0.6 in the CR whereas the sample 17c-2 with lower carbon contents (0.09%) showed absorption peaks of about 0.4 (Figure 6).
The average error rate becomes lower with the CR both in calibration and validation results of carbon estimation. With the model calibration of 12components (Figure 7(a)), the RMSE decreased from 0.04 in full spectrum (FS) to 0.03 after the continuum removed (CR). In validation (Figure 7(b)), it’s the model of 15 components which provided more accurate result with RMSE ranging from 0.04 in the full spectra to 0.03 in the CR. At the same time, the coefficient of determination (R2) increased from 0.6 (FS) to 0.7 (CR) in calibration at the model of 12 components (Figure 8(a)). For the validation, the R2 ranged from 0.6 (FS) to 0.7 (CR) at the model of 15 components (Figure 8(b)). The average organic carbon content for the observed data is 0.21%. The predicted one with the continuum removal is also 0.21%. With the full spectrum (FS), predicted data equal 0.20 % (Table 2).
Figure 6. Comparison between changes in reflectance values of the full spectrum (FS) continuum removal (CR) and continuum curve (CC) of three soil samples.
Table 2. Observed and Predicted organic carbon content using the PLS model with the full spectrum (FS) and the continuum removal (CR) from the 1/3 data set of validation.
Figure 7. Effect of continuum removal in the RMSE SOC prediction.
Figure 8. Effect of continuum removal in the R2 SOC prediction.
Continuum Removal (CR) has allowed improving the estimation of soil organic carbon by spectroscopy with the PLSR model. In calibration, organic carbon is predicted with a coefficient of determination ranging from 60% (RMSE; 0.04) in the full spectrum (SP) to 70% (RMSE; 0.03) with the application of the CR. These result showed that the organic matter content can have a linear or curvilinear relationship with reflectance in the visible and infrared range  . The reflectance values of these spectral regions are taken into account in the estimation of the soil organic carbon. Like others results   , this study emphasized the interest to implement the preprocessing methods on the spectral libraries data achieving with Vis-NIR spectroscopy before predicting physical and chemical soil properties. The application of CR in the estimation of biochemical and chemical properties of soil highlighted a particular interest in the extent to organic carbon was recognized as a soil quality indicator in Sahelian farming systems   -  . So, in the challenge for the quantification of the spatio-temporal dynamics of carbon storage at the plot, landscape and national scales  , the potential of contribution of Vis-NIR technology is very important. This quantification requires high spatial densities of soil samples  and Vis-NIR spectroscopy offer possibilities to analyse physical and chemical soil properties with a lower coast and less time by using accurate model of prediction.
This study has allowed on the one hand to understand better the application modalities of the continuum removal method in the spectroscopy of soil samples. Indeed, when the value of the continuum removal (CR) equals to 1, the full spectrum (FS) and the continuum curve (CC) will present the same values of reflectance. On the other hand, our result (R2 equals to 0.7 and RMSE ≤ 0.03) obtained with the application of CR is acceptable. However other method of pre-processing data like the multiplicative scatter correction function  must be tested for improving the accuracy of the prediction model of soil organic carbon with Vis-NIR spectroscopy. It’s also necessary to perform the neural network model on this dataset in order to better evaluate the effect of the continuum removal in the estimation of physical and chemical soil properties. This approach is a mean to better evaluate the performance of different data mining models for the study of the soil properties related to the Vis-NIR spectroscopy data.