Soil salinity is widespread in the southern part of Tunisia from the east coast until the desert in the south. It is considered an important component of ecosystem degradation in the world’s dry lands and can lead to desertification and other form of land degradation, such as salinization   . According to  , roughly 20% of irrigated agriculture worldwide is affected by salinization. Saline soil map of the world is provided by the FAO/UNESCO with different ranking of salinization. Australia occupies the first place with 84.7 × 106 ha, while Africa has the second with 69.5 × 106 ha, and then come Latin America and Middle East taking the third and fourth classes respectively with 59.4 × 106 ha and 53.1 × 106 ha   . Salt affected soils can be found on every continent, and at elevations ranging from 5000 m (Tibetan plateau) to below sea level (Dead Sea) with over 10 percent of the total surface of dry land being salt-affected  . Soil salinity in southern Tunisia results in several negative impacts, such as limiting plant growth, reducing crop productivity and degrading soil quality   . Monitoring and mapping salt affected area is required to fully describe this phenomenon. Similar studies that combine the remote sensing with statistical analysis and ground truth measurements have been carried out, where it was found as the most efficient   . Various remote sensing data are being widely used to identify and map saline soils, including aerial photographs, multi and hyper-spectral remote sensing data   .
Some extensive research efforts have been made by international scholars to monitor and predict saline soils using remote sensing and statistical analysis methods   . They have proven that remote sensing data using statistical methods, such as principal component analysis (PCA) and cluster analysis (CA), is a useful and promising method to monitor and predict salt-affected soils, especially those with high salinity.  and  reported also that the use of PCA and CA is an effective means to recognize and extract factors to assess impact of saline soils on groundwater quality in agriculture field   .
Remotely sensed data combined with statistical analysis can be efficiently used as a model for soil salinity assessment. It could result in substantial cost savings relative to traditional lab salinity measurements. The present study is conducted to improve the identification and prediction of saline soils in an arid context through the use of free and open source remote sensing data. We first introduced and evaluated the soil sampling then statistical and the remote sensing methods used in his study. Regression analysis were then performed to provide, furthermore, substantial information about interrelation between the multispectral data and ground truth using principal component analysis (PCA) and cluster analysis (CA) on multispectral Landsat data image in southern Tunisia. These statistical tests are applied for the interpretation of complex monitoring data matrices that can help then to better understand the interrelation between the multispectral data and ground truth measurement to rapidly predict salt affected soils.
2. Materials and Methods
2.1. Investigation Area
Gabes-Ghannouch is both a Mediterranean and Saharan region. It is located in South-Eastern Tunisia from Jeffara plain into the Gulf of Gabes (see Figure 1(a)). The study area has been chosen not only because of the important agriculture interests in this region, but also the environmental problems related to soil, such as salinization  . Geographic location corresponds to a Latitude/Longitude respectively about 33˚42' and 10˚30' (Figure 1(b)). It has a typical Mediterranean climate where maximum temperatures reached in the period between June and August (48˚C), while the coldest temperatures are measured between December and February. Due to its proximity to the sea, the climate of the study area slightly differs from the typical arid or semi-arid. The rainfall is
Figure 1. Location of the study area: (a) Composed MODIS image of Mediterranean Sea from 2005; (b) Landsat image of southern Tunisia 2010.
irregular and ranges between 150 - 240 mm per year with six months dry season (April-Sept), where the rain does not exceed 4 mm per month.
According to  the investigated area is situated under an arid climate, where the annual evaporation value is ~1950 mm using the Pische and Bac methods adopted by the National Meterologic Direction  . The evaporation in this region is relatively very high due to the dry climate conditions; therefore salt left after water evaporation on the top soils accumulates rapidly  and accelerates the soil salinization process. This fact leads to salt accumulation in the upper layers of the Chott sediments and to crust formation   .
The study area includes wetlands and steppe plains as well as areas used for agriculture.
Several authors have demonstrated the advantage of combining data from remote sensing with ground truth measurements. Considering the complexity of the salinization process, the identification of salty regions and especially slightly low and moderately affected areas (the case of our field area) remains challenging. Our approach displays an attempt to predict salt affected areas in the southern Tunisia through several remote sensing and geo-statistical techniques  . Salinity in the topsoil is determined by measuring electrical conductivity (EC) then predicted based on regression analysis and using reflectance bands and spectral salinity indices from the remote sensing data.
2.2.1. Soil Sampling Method
Soil samples are collected within the upper ~10 cm from the soil surface. The campaign of soil sample collection was made in May and June 2010, which corresponds to the multi-spectral data acquisition date. The choice of dry season to collect the samples was not arbitrarily selected, but aimed at enhancing the detection of spectral characteristics of salt at surface during salt accumulation at that specific time; Salt in the soils, in dry season, is rising up due to capillarity. The signal of salty soil, at this period of the year, is stronger and easier to detect from the optical sensors  . The soil sample locations were selected in such a way to minimize any noise that could affect the spectral signature from the soil. Thus, all samples used in this study are at least 30 m away from objects, which are not defined as soil (e.g.: trees, houses, streets, etc.).
At all sample location, a procedure is used to collect the soil. Each analysed sample in this work is a mix of four soil samples. These 4 samples are collected from 4 corners of a (30 × 30) square, where the center is considered the location of the sample (Figure 2), then the mix of 4 soil collected from 4 corners is the soil sample considered for chemical analysis (Figure 3). These steps are applied for all the samples, in order to optimize the representation of the samples within the pixel of the Landsat TM image  . The use of 30 × 30 m square for the samples collection aims to be correlated to the spatial resolution of the multispectral image.
Figure 2. Soil sampling method.
Figure 3. Mixing of the samples from the 4 corners to represent one soil sample.
Salinity at the top-soil is determined by measuring electrical conductivity (EC). 1/5 soil/water diluted extracts is a convenient method  used in this study to estimate soil salt content. To measure the EC of our samples, following steps are conducted: 1) Drying the samples, 2) Sieving (Size of the soil particle <2 mm), 3) Agitation, 4) and then measuring the EC values. EC is usually expressed in decisemens per m at 25˚C (dS/m).
2.2.2. Remote Sensing Data Used
The satellite image Landsat was used to study the soil salinity. This image is acquired in June 17, 2010. The multi-spectral thematic mapper (TM) provides views in 7 spectral bands from visible to infrared with a resolution of 30 meters   . Landsat spectral bands are incorporated into a spectrum range varying from 450 nm (blue) to 2350 nm in the SWIR  .
To predict soil salinity from satellite images, we used the bands reflectance, considered as spectral indices, and the spectral salinity indices derived from the blue, green, red, near infrared and shortwave infrared bands. Meanwhile, we investigated the correlation between salinity and six spectral salinity indices used mainly in arid and semiarid climate  . The indices are described in Table 1.
After computing the spectral indices from the Landsat image corresponding
Table 1. Formula used to generate the indices.
With R: red, B: blue, G: green and NIR: near infra red.
to the sampling sites, correlation analyzes between the EC measurements and these indices were performed, these correlations are based on the Pearson function.
2.2.3. Statistical Data Processing
Statistical method are performed like cluster analysis (CA) as exploratory analysis that tries to identify structures within the data and highlight homogenous groups of cases if the grouping is not previously known   . Cluster analysis (CA) is an assortment of techniques designed to perform classification by assigning observation to groups and distinct them from other groups  . The purpose of using such technique is to understand the relation between the spectral indices and the electrical conductivity of sampled soil. In clustering, the distinct groups can reveal either the interaction among the variables (R-mode) or the interrelation among the samples (Q-mode)  . Furthermore, we used the Principal Components Analysis to identify a smaller number of uncorrelated variables, called “principal components”, from a large set of data. With this analysis, we create variables (using principal components) that are in linear combinations of the observed variables. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components.
2.2.4. Linear Regression Model
A linear regression was used to establish relationship between the NIR, SWIR spectra and the reference data from analysis of EC based on the statistical analysis. The highest values of R2 and the lowest value of RMSE (root mean square error) were used to determine the optimal calibrated model. The smallest RMSE indicate the most accurate prediction, this RMSE was derived according to equal of (1). The model will be assessed graphically by analyzing the standardized residuals versus the predicted values of EC. By plotting the residuals with the descriptive variable, if a trend is identified, it indicates that the model is not accurate and there is an autocorrelation in the residuals, which is contrary to one of the assumptions of parametric linear regression.
where: N; Number of points, Z*(xi) is estimated value at point xi Z (xi) and is observation value at point xi.
3. Result and Discussion
Based on the data set collected from the fieldwork, the investigation area is considered as highly affected by salinity according to the results obtained from the Department of primary industries in Australia  . The study area is also dominated by a gypsic soil  . These areas of high and extreme saline soil are completely degraded region, where plants growth is suppressed. Alike Halophyte plants, which are very rare to find and it is very hard to grow through the high content of gypsum  .
3.1. Descriptive Analysis
The distribution of the EC values is characterized by an average of 4.37 dS/m and a standard deviation of 2.8. A significant difference between a minimum of 0.15 dS/m (EC of healthy soils) and a maximum of 16.92 dS m (EC of saline soils), which reflects a significant spatial variability of this component  .
3.2. Correlation between Spectral Indices and EC from the Ground Truth
A Pearson correlation between the electrical conductivity values and the Landsat spectral bands was conducted (see Table 2) to evaluate which spectrum interval could reveal more about the salt affected area. Correlation between the Landsat spectral bands and EC from the ground truth shows almost weak to moderate correlation in the Visible. However, a good correlation (Table 2) was found between the electrical conductivity and the Landsat spectral bands located in the Short-wave infrared (SWIR) regions of the spectrum interval.
The spectral bands B5 and B7 (located in Short-wave infrared) are the most correlated bands. These bands are also the most associated to each other with a correlation of 0.95. The Bands located in the visible interval of the spectrum presented a low correlation varying between 0.3 and 0.6.
The empiric equation A = log(1/R) transforms the reflectance to absorbance (Abs). In our study we applied it in order to convert the reflectance into Absorbance values (Abs). This transformation improves the correlation by 0.03. Therefore, spectral bands absorbance from Landsat were considered as spectral indices and used as input in the regression and statistical analysis models (PCA and CA). The absorbance of the band 7, which is abs7, provides the highest correlation of 76% with EC (see Table 3), not only among the salinity indices but all the spectral indices performed in this work. The p-value for the correlation in Table 3 is less than or equal to the significance level of 0.05. Spectral indices show good correlation with the EC, varying between 0.49 and 0.75. The good
Table 2. Correlation matrix between Landsat spectral bands and EC values.
P-value ≤ 0.05.
Table 3. Correlation matrix between salinity indices and EC values.
P-value ≤ 0.05.
correlation with the EC is observed for both salinity indices (calculated from original bands of Landsat image) and spectral bands from the satellite image (see Table 1 and Table 2). The most correlated with the EC are spectral salinity indices SI5 and SI9 presented in Table 3. These findings are similar to the findings of   , where SI5 and SI9 show the highest potential for discriminating salt-affected soil.
3.3. Cluster Analysis
In this study, the R-mode hierarchical CA was performed on the normalized data set using the Ward’s linkage method  , with Euclidian distance as a similarity measure and was synthesized into dendogram plots. The R-mode hierarchical cluster analysis was used for the set of 66 samples and 17 variables. It yielded a dendrogram shown in Figure 4, grouping all of the 17 descriptors into three statistically significant clusters. From this dendrogram, one can find the
Figure 4. Dendrogram generated data showing relations between variables.
relationship between different variables; the dendrogram affirm the high correlation between electrical conductivity EC and spectral indices such abs4, abs5, abs7 and si5. The second cluster shows correlation between all spectral indices derived from visible (B1, B2, B3, si1, si2, si3, and si8). The third cluster indicates correlation between the bands of NIR and si9, these variables are anti-correlated with EC for this reason they constitute a whole cluster.
3.4. Principal Component Analysis
Principal component analysis is essentially a dimension reduction technique that used widely to visualize and interpret a dataset with several variables. In this study, PCA is performed by using 16 variables (reflectance bands and spectral salinity indices from the remote sensing data) except electrical conductivity EC.
Based on the eigenvalues > 1, the first two factors are selected to represent the related spectral indices to soil salinity, without the loss of significant information.
The PCA generated two Principals components which together account for 92.24% of variance.
The first major axis has an eigenvalue equal to 13.18 and explains 56.03% of the total variance. The second axis has an eigenvalue equal to 1.57 and explains 36.2% of the total variance.
The projection on the PC1-PC2 plan shows the existence of 4 groups in Figure 5; the first principal component (PC1) (with a variance of 56.03%) represents in its positive part a grouping of elements such as B1, B2, SI8, SI2, B3, SI3 and B4. However, its negative part is characterized by abs b4. Thus, PC1 may be related to the processes associated with bands of the visible domain.
The PC2 (36.2% of the total variance) represents in its negative a group of elements such si5, abs b7 and abs b5. Nevertheless, the second principal component
Figure 5. PCA graphical representation along PC1 and PC2 axes.
(PC2) shows in its positive part a grouping of elements B6, SI9 and B7. Therefore, PC2 is associated with all the bands and indices from NIR and SWIR.
The electrical conductivity EC is highly correlated (R2 = −0.72) to the second principal component (PC2) but no correlation is observed between EC and the first principal component (PC1) R2 = −0.28. This suggests that the second component can be used as an explanatory variable for predicting EC.
3.5. Regression Analysis Modelling
The linear regression is used to predict the spatial variability of soil salinity based on remote sensing and ground truth measurements. The prediction of the EC values from Landsat bands and the spectral indices is associated with the identification of 2 variables shown in Equation (2). A very significant coefficient of determination R2 indicates that the predictor variables used in the model shown in Figure 6 can explain 63% of the total variation of the predicted EC values. The regression empirical relationship is given by the following formula:
The best linear regression empirical relation (2) is based on the PC2, which represent all the correlated variables with EC derived from PCA analysis and the absorbance band 4 of Landsat image. These four indices show the highest correlation with the EC from the ground truth.
The standard error RMSE (root mean square error) of the estimation is about 1.86 dS/m. This error decreases with increasing soil salinity, which means the higher the electrical conductivity is, the closer the predicted conductivity will lie to the ground truth measurement.
The empirical relationship between measured and estimated EC values showed an overestimation of the predicted electrical conductivity values. Figure 6 shows that predicted values of electrical conductivity are often higher than the values from the ground truth measurements with a coefficient of determination about 0.62 at a significance level of 0.05.
The plot of the standardized residuals versus the predicted values of EC shown in Figure 7 proved that no specific trends are identified; therefore, our proposed regression model is approved.
Figure 6. Relationship between measured and estimated electrical conductivity values.
Figure 7. Relationship between estimated electrical conductivity values and standardized residuals.
The present study explores the use of remote sensing indicators from Landsat data for the assessment and monitoring of salt-affected soil over an arid region. The proposed remote sensing methodology provides a reliable variety of indicators to address land degradation throughout salinization.
The use of the cluster analysis revealed a strong correlation of PC2 with EC and spectral indices derived from SWIR. The combination of principal components analysis used with cluster analysis, applied in our study, demonstrated a high performance to identify spectral indices that are more associated to soil salinity.
Furthermore, the correlations found are relatively high; they reveal the potential of various spectral indicators from the multispectral data to predict salt affected areas in south-eastern Tunisia. Therefore, we propose that the combination of spectral indices, statistical tests and ground truth measurements is a trustworthy approach to identify saline soils.
The multi linear regression analysis based on the spectral indices (PC2 and abs B4) was used to predict soil salinity. It reveals an important coefficient of determination (R2 = 0.62) with a low RMSE of 1.86 dS/m. Therefore, the generated regression model is considered as an efficient and rapid tool to predict soil salinity over arid region, such as southern Tunisia.
We gratefully acknowledge the financial support from the Deutscher Akademischer Austausch Dienst DAAD.
 Ben-Dor, E., Patkin, K., Banin, A. and Karnieli, A. (2002) Mapping of Several Soil Properties Using DAIS-7915 Hyperspectral Scanner Data: A Case Study over Soils in Israel. International Journal of Remote Sensing, 23, 1043-1062.
 Bouaziz, M., Leideg, M. and Gloaguen, R. (2011) Optimal Parameter Selection for Qualitative Regional Erosion Risk Monitoring: A Remote Sensing Study of SE Ethiopia. Geoscience Frontiers, 2, 237-245.
 Liu, X.H., Skidmore, A.K. and Oosten, H.V. (2002) Integration of Classification Methods for Improvement of Land Cover Map Accuracy. ISPRS Journal of Photogrammetry & Remote Sensing, 56, 257.
 Ghosh, G., Kumar, S. and Saha, S.K. (2012) Hyperspectral Satellite Data in Mapping Salt-Affected Soils Using Linear Spectral Unmixing Analysis. Journal of Indian Society of Remote Sensing, 40, 129.
 Vincent,B. (2003) Remote Sensing for Spatial Analysis of Irrigated Areas. In: Pereira, L.S., Cai, L.G., Musy, A., Minhas, P.S., Editors, Water Savings in the Yellow River Basin: Issues and Decision Support Tools in Irrigation, China Agriculture Press, Beijing, 29-45.
 Liu, C.-W., Lin, K.-H. and Kuo, Y.-M. (2003) Application of Factor Analysis in the Assessment of Groundwater Quality in a Blackfoot Disease Area in Taiwan. The Science of the Total Environment, 313, 77-89.
 Domuaa, C., Scheau, V., Scheau, V., Sandor, M., Bandici, Gh., Sabau, N.C., Samuel, A., Borza, I. and Domuaa, Cr. (2007) Comparison between the Peach-Tree Water Consumption and the Reference Evapotranspiration in the Conditions from Western Romania. Analele Universittaiidin Oradea, Fascicula: Protecaia Mediului, 12, 46-50.
 Hongqing, W., Hsieh, Y.P., Harwell, M.A. and Huang, W. (2006) Modeling Soil Salinity Distribution along Topographic Gradients in Tidal Salt Marshes in Atlantic and Gulf Coastal Regions. Ecological Modelling, 201, 429-439.
 Shrestha, D., Margateb, D.E., vander Meer, F. and Anhc, H.V. (2005) Analysis and Classification of Hyperspectral Data for Mapping Land Degradation: An Application in Southern Spain. International Journal of Applied Earth Observation and Geoinformation, 7, 85-96.
 Dutkiewicz, A., Lewis, M. and Ostendorf, B. (2009) Evaluation and Comparison of Hyperspectral Imagery for Mapping Surface Symptoms of Dry Land Salinity. International Journal of Remote Sensing, 30, 693-719.
 Lu, D. and Weng, Q. (2005) Urban Classification Using Full Spectral Information of LANDSAT ETM+ Imagery in Marion County, Indiana. Photogrammetric Engineering and Remote Sensing, 71, 1275-1284.
 Manadhar, R., Odeh, I.O. and Ancev, T. (2009) Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data Using Post Classification Enhancement. Remote Sensing, 1, 330-344.
 Mantero, P., Moser, G. and Serpico, S.B. (2005) Partially Supervised Classification of Remote Sensing Images through SVM-Based Probability Density Estimation. IEEE Transactions on Geoscience and Remote Sensing, 43, 559-570.
 Hussain, M., Ahmad, S.M. and Abderrahman, W. (2008) Cluster Analysis and Quality Assessment of Logged Wateratan Irrigation Project, Eastern Saudi Arabia. Journal of Environmental Management, 86, 297-307.
 Wu, T.N., Huang, Y.C., Lee, M.S. and Kao, C.M. (2005) Source Identification of Ground Water Pollution with the Aid of Multivariate Statistical Analysis. Water Science and Technology: Water Supply, 5, 281-288.
 Dehaan, R.L. and Taylor, G.R. (2002) Field-Derived Spectra of Salinized Soils and Vegetation as Indicators of Irrigation-Induced Soil Salinization. Remote Sensing of Environment, 80, 406-417.
 Triki, I., Trabelsi, N., Zairi, M. and Ben Dhia, H. (2013) Multivariate Statistical and Geostatistical Techniques for Assessing Groundwater Salinization in Sfax, a Coastal Region of Eastern Tunisia. Desalination and Water Treatment, 52, 1980-1989.
 Gueddari, M., Monnin, C., Perret, D., Fritz, B. and Tardy, Y. (1983) Geochemistry of Brines of the Chott el Jerid in Southern Tunisia—Application of Pitzers Equations. Chemical Geology, 39, 165-178.