The project has been financed in 2014 by Eurostat under the theme 8.4―“Pro- vide quality agriculture, fisheries and forestry statistics” with the title “Pilot studies to develop methodological improvements to agri-environmental statistics and statistics on grassland production”. The main goal has been the assessment of grassland biomass by means of free satellite images. The proposed method has built upon the correlation existing between the image values, in different spectral bands, and the grassland height, strictly depending on the biomass of the relative vegetation classes  -  .
The regression model needs to be calibrated by ground truth samples that should be collected almost simultaneously with the satellite acquisition date. So, the need of fast and accurate measurements campaigns is crucial for getting a high correlation between real grassland height and image values.
2. Study Area
During the planning phase, three areas have been identified for the collection of training and test samples, one in the North, the second in Central and the third in Southern Italy. A specific mission in the National park of Murge, in Puglia, South of Italy, has shown however the unsuitableness of the local grassland for the model, due to the presence of stones and shrubs over the ground and the low number of pastures in the small, scattered and flatten areas. The collected samples of the grass height have also demonstrated the diversity of vegetation classes and the need to build a complex regression model considering the height of each class, or each group of classes, separately.
The foreseen campaign of measures in Northern Italy has not taken place until now due to limited time and resources. Therefore, the model has only been trained with the two-year campaigns carried out in Central Italy, in a site named Pian Grande (Figure 1).
Pian Grande is an upland karstic plain in the national park of Monti Sibillini located between Umbria and Marche, in the Appennini mountains, in central It-
Figure 1. Location of the study area (Reference Cartographic System: UTM ED50 Zone 33).
aly. It is a quite suitable site for the grassland production because of the lack of trees and shrubs over the flat plan, located at 1300 meters above sea level. The plain is the bottom of an ancient mountain lake, now dried up, and has a rectangular shape with an area of around 15 Km2.
Regarding vegetation variations, it is possible to consider the existence of 4 types of homogeneous areas (Figure 2):
・ areas with agricultural crops (grass meadows, barley and lentils) located in the nearby of Castelluccio village and below the slopes of Vettore Mt., corresponding to the class 2.1.1 (Non-irrigated arable land) of CORINE Land Cover 2012 map (http://www.eea.europa.eu/publications/COR0-landcover);
・ areas with natural grassland, equivalent to the class 3.2.1;
・ areas with pastures, that occupies the central portion of the plain, corresponding to the class 2.3.1 (Pastures);
・ bare rocks (3.3.2) or sparsely vegetated areas (3.3.3), located in the nearby of Vettore Mt.
The pasture reaches its highest growth at the end of June, depending on the temperatures, after a suggestive blossom, and produces high quality bales of hay. The best period for the field surveys is between the end of June and the beginning of July, just before the cut of the grassland. Two missions have been carried out in the past two years (2015-16) during the same period of the year.
Figure 2. CORINE Land Cover 2012 map.
3. Sampling Methodology
The need to collect many samples for the grassland height in a limited time and with the highest possible accuracy, taking also into consideration the pixel dimension of the satellite images used for the regression model (see paragraph 5), has suggested to the use of the camera and to postprocess the images in order to derive the height with some image processing tools.
The assessment of the height has been executed with a white cardboard of 300 × 70 cm placed on the grass and used as a background for the photograph in order to derive just a linear section of the grass height (Figure 3). Every picture has been linked to the GPS position of the same sample point.
The high contrast of the green vegetation over the white background has helped in the post-processing of the photographs that has consisted in the extraction, rescaling to a standard width and measure of just the white area (c section in the Figure 4) in terms of number of pixels. The division of that area by the fixed width produces its mean height and, by difference with the total height of the cardboard, the mean height of the grass.
Figure 3. Survey of a height of a section of grass.
Figure 4. Postprocessing of the pictures.
The pictures have been analyzed with an open source image processing software: ImageJ   . The software is written in Java and is available for Windows, Linux and Mac OS and OS X (https://imagej.nih.gov/ij/index.html). Some of the software tools used in the software for the process are the Colour Threshold, for the selection of the white pixels, and the Colour Pixel Counter, for the determination of their number.
In the 2015 survey a simpler brown cardboard has been used, but sometimes the colour separation and the extraction of grass has been difficult, above all in case of dry hay. Another lesson learnt during the first year of the project is to avoid measures in the second half of July, when the grass is usually cut and compacted in bales.
This method has allowed a very fast survey and a high number of samples collected during each day of measurements. 88 samples have been acquired in a two-day campaign on June 2016 (Figure 5), while just 20 were the points surveyed in 2015 because of the partly mown pasture.
4. Satellite Images
After Currently two mid-resolution satellites acquire images from the earth surface with optical sensors and deliver them free of charge to all users: the Landsat 8 (Nasa) and the Sentinel-2 (Esa).
Figure 5. Location of the samples acquired on June 2016 (basemap: false color image from Sentinel-2).
The OLI sensor of Landsat 8 has replaced the ETM+ one mounted on Landsat 7 but maintains the same bands and the same ground resolution (30 meters) of the previous satellites, allowing in this way the time series analysis  .
Sentinel-2A is one of the satellites of the Copernicus Esa observation programme and delivers images at a spatial resolution of 10 meters in visible―near infrared spectra. By March 2017 the new Sentinel-2B will join the Esa constellation, increasing in this way the revisiting frequency on the same area  .
Radar images have also been considered for the correlation model: the Sentinel-1 satellites acquire polarized images in C-band with a spatial resolution of 20 meters (Interferometric Wide mode)  . From the first results of the model it seems S-1 images are not so useful for the determination of grass height (see S1_VH and S1_VV in Table 1). Nevertheless, it has been planned to test the model with the high-resolution radar images of the Cosmo-Skymed Italian mission.
The values for every image and band have been extracted with the QGIS Zonal Stats tool (http://www.qgis.org/it/site) with a 20-meters buffer centered on each samples and by extracting basic statistics, like mean and standard deviation, from the overlapping pixel values.
5. Statistical Analysis
Descriptive statistics are used to describe the basic features of the data by means of some numeric indexes: mean, standard deviation (SD), standard error of the mean (SEM), coefficient of variation (CV), skewness and kurtosis. The univariate correlation has been considered too, with the evaluation of the Pearson correlation coefficients and the relative correlation matrix.
Furthermore, a multiple linear regression model has been implemented, based on ordinary least squares method, that is one of the most frequently used statistical approaches to model the correspondence between spectral and field data  .
The regression model consists of a dependent variable, the vegetation height (Hv) measured in the in-situ survey, and the independent variables, represented by the spectral radiance levels of the bands from Landsat 8 and Sentinel 1-2 satellites.
Twenty of the acquired 88 samples have not been considered in the model because of the non-homogeneity of the pixels values in the 20-meters buffer area around each sample. A resulting number of 68 points has been used to train the model.
The adjusted coefficient of determination (Adjusted R2) and root mean square error (RMSE) were used to quantify the amount of variation explained by the developed relationships and the accuracy of the relationships  . Higher Adjusted R-squared values correspond to lower RMSE values and a higher precision and accuracy of a model for predicting the response variable.
A backward elimination approaches based on Akaike’s Information Criteria (AIC) was used to identify the model that provides the best description of the data using the smallest number of parameters  .
Descriptive statistics and correlation for Sentinel image are shown in Table 1, Table 2 and Figure 6. A positive correlation exists between the S2_NIR and S1_VV while remains negative for all the others bands.
Table 3 shows the T-test between the most significant predictors for the model (Red, Green, NIR) and how they differ from each other.
The regression model with the least AIC is:
The overall significance of the regression model has been estimated with the F-test (F-statistic: 39.9 on 3 and 64 degree of freedom, p-value: 1.165e−14) and the model shows an Adjusted R-square of 0.63.
The linearity of the function between the dependent and the independent variables has been verified with the graphic of residuals (y-axis) towards the foreseen values (x-axis), shown in Figure 7. The hypothesis of linearity seems reasonable because of the symmetric distribution of the points around the horizon-
Table 1. Descriptive statistics of Sentinel bands and vegetation height variable.
Table 2. Correlation matrix of Sentinel bands and vegetation height variable.
Figure 6. Scatter plot matrix of the variables considered in the model (Sentinel).
Table 3. Correlation matrix of Sentinel bands and vegetation height variable.
aSignif. codes: ***: 0.001, **: 0.01, *: 0.05, .: 0.1.
Figure 7. Residuals analysis of the regression model (Sentinel).
Table 4. Descriptive statistics of Landsat 8 bands and vegetation height variable.
tal line with zero intercept.
The normality distribution of residuals has been verified with the QQ-plot of standardized residuals, where it’s possible to see the points laying near the QQ line.
The multicollinearity between regressors has also been assessed by means of the Variance Inflation Factor (VIF), visible in Table 3. Its value seems exclude the redundancy between the explanatory variables.
5.2. Landsat 8
Descriptive statistics and the correlation matrix for Landsat 8 image have been evaluated as well (see Table 4, Table 5 and Figure 8). The correlation is similar to Sentinel: positive between the grass height and the NIR band while remains negative for all the other bands. Table 6 shows the coefficients obtained with the T-test.
The resulting model is:
The F-test reports similar results of the previous case (F-statistic: 29.63 on 3 and 64 degree of freedom, p-value: 3.94e−12) and the model shows an Adjusted R-square of 0.56.
Table 5. Correlation matrix of Landsat 8 bands and vegetation height variable.
Figure 8. Scatter plot matrix of the variables considered in the model (Landsat 8).
Table 6. Correlation matrix of Landsat 8 bands and vegetation height variable.
aSignif. codes: ***: 0.001, **: 0.01, *: 0.05, .: 0.1.
Table 7. Comparison between models.
Figure 9. Residuals analysis of the regression model (Landsat 8).
Figure 9 shows the residuals against fitted plot and the normal Q-Q plot of the standardized residuals for verifying the assumptions of a linear model in the same way as done in the previous paragraph.
The performances of the models are assessed using repeated k-fold cross-valida- tion in order to obtain results that are independent of a particular partitioning. In this approach, k randomly subsamples (k = 10), or folds, are derived. The model was fitted k times on the combined data of k-1 folds and tested on the data of the remaining fold by applying the fitted model to the test fold and calculating the RMSE  .
To perform the validation the cv.lm function in the DAAG package was used (https://cran.r-project.org/web/packages/DAAG/index.html).
7. Comparison between Models
The adjusted R2 coefficient obtained in the models is quite similar: 0.63 for the Sentinel-2 and 0.56 for the Landsat 8 (Table 7). Their difference seems depending above all on their spatial resolution and on the wider area covered by the
Landsat pixel. The sat acquisition dates (S2: 15/06/2016, L8: 26/06/2016) are anyway close to the 2016 survey and the grass height is almost the same.
Estimated height depends in any case on three image bands: green, red and near infrared. The proportionality is inverse with the green band while remains direct with the red and NIR.
The derived root mean square error is linked to the points distribution and seems anyway quite high (2.89 cm for Sentinel-2 model and 3.27 cm for Landsat model) considering the medium height of the grass for all the 68 points (12.53 cm).
The project results show a poor correlation (Adjusted R2 ~ 0.6) between the VNIR bands and the grass height. Radar Sentinel-1 images, even polarized, seems not influenced by different vegetation biomasses of upland pastures, perhaps due to the limited grassland height. From the statistical results seems the higher grass has a bigger component of red while green and NIR decrease.
However, the sampling method permitted fast operational activities and has given good results. The model seems to work better for mid-height grassland (10 - 20 cm), perhaps because in case of very low grass the images receive signals from other visible elements, like small stones, dry grass, and the terrain itself, while the higher grass doesn’t change the same radiometric values acquired for mid-height samples.
The relation between the grassland height and the relative biomass depends on the vegetation type. In the study area, excluding the wetland present in the southern part of the plain, the vegetation is mainly composed by Caricetum and Festucetum.
The grass development in height and width is quite related with the biomass increase    . In particular there is a linear relationship between biomass and grass layer height  . Specific analysis have assessed a relation between biomass of vegetation and the height of pasture: 10 - 12 cm of grass height correspond to 1200 to 1500 kg∙ha−1, considering dry grass.
The adjusted R2 derived for Sentinel-2 and Landsat images is not so good as expected, but is quite near to the corresponding values obtained in similar studies. The Sentinel-2 image has also been corrected for atmospheric effects by means of the sen2cor tool implemented in the ESA SNAP software, but returned similar results.
Possible improvements could be seen in the use of higher resolution optical and radar images, for the check of the homogeneity of vegetation cover and the extraction of texture indexes. The analysis of the polarization of the Sentinel-1 image has not given the fruitful results, perhaps due to the wavelength of the C-SAR instrument (~5.5 cm), quite close to the height of grass.
Even if the first attempts to use the radar images have been unfruitful, the plan is the future use of the complex data of the radar hi-res images acquired by the Italian satellite mission Cosmo-Skymed. Hopefully, in this way it will be possible to measure the soil roughness caused by the grassland (height) and the number of bales of hay obtained after the mown (biomass).
Special thanks are due to Stefano Mugnoli and Alberto Sabbi (National Institute of Statistics) for the review of the vegetational and statistical analysis, to Massimo Bianco (Institute for Environmental Protection and Research―Ispra) for the botanic information about Pian Grande vegetation classes, to Flavio Borfecchia (National Agency for New Technologies, Energy and Sustainable Economic Development―Enea) for the information on previous studies on the same site and to Luca Pietranera (e-Geos) for the support on Cosmo Skymed data.
 Grant, K., Wagner, M., Siegmund, R. and Hartmann, S. (2015) The Use of Radar Images for Detecting When Grass Is Harvested and Thereby Improve Grassland Yield Estimates. Grassland Science in Europe, Grassland and Forages in High Output Dairy Farming Systems, European Grassland Federation, Wageningen, 15-17 June 2015, 419-421.
 Grant, K., Siegmund, R., Wagner, M. and Hartmann, S. (2015) Satellite-Based Assessment of Grassland Yields. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Proceedings of 36th International Symposium on Remote Sensing of Environment, Berlin, 11-15 May 2015, 15-18.
 Guo, X., Wilmshurst, J., Mc Canny, S., Fargey, P. and Richard, P. (2004) Measuring Spatial and Vertical Heterogeneity of Grasslands Using Remote Sensing Techniques. Journal of Environmental Informatics, 3, 24-32.
 Schino, G., Borfecchia, F., De Cecco, L., Dibari, C., Iannetta, M., Martini, S. and Pedrotti, F. (2003) Satellite Estimate of Grass Biomass in a Mountainous Range in Central Italy. Agroforestry Systems, 59, 157-162.
 Anniballe, R., et al. (2015) Synergistic Use of Radar and Optical Data for Agricultural Data Products Assimilation: A Case Study in Central Italy. Proceedings of the International Geoscience and Remote Sensing Symposium, Milan, 3381-3384.
 Schindelin, J., Rueden, C.T., Hiner, M.C. and Eliceiri, K.W. (2015) The ImageJ Ecosystem: An Open Platform for Biomedical Image Analysis. Molecular Reproduction and Development.
 Kohavi, R. (1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence, Vol. 2, Montreal, 20-25 August 1995, 1137-1145.
 Gür, M., Altin, M. and GÖkkus, A. (2015) Determination of Grazing Time with Relationships between Grass Layer Height and Biomass Change in Natural Pastures. African Journal of Agricultural Research, 10, 3310-3318.
 Yunxiang, J., et al. (2014) Remote Sensing-Based Biomass Estimation and Its Spatio-Temporal Variations in Temperate Grassland, Northern China. Remote Sensing, 6, 1496-1513.
 Anderson, D.M. and Kothmann, M.M. (1982) A Two-Step Sampling Technique for Estimating Standing Crop of Herbaceous Vegetation. Journal of Range Management, 35, 675-677.