Normal and Bootstrap Confidence Intervals in Bitterlich Sampling

Show more

1. Introduction

Sampling in forest inventories is usually done by installing random points on the ground and selecting a group of trees around the points. Trees are generally selected using the two most well-known forest sampling methods: the fixed-area plot sampling and Bitterlich Sampling (BS) or horizontal point sampling.

In fixed area plot sampling, fixed shape and size are defined at each point (center) and are the basic sampling unit in which all the trees are measured (Kershaw Jr., Ducey, Beers, & Husch, 2016; Matis, 2004). In BS, the tree j is selected in the sample if the random point i is at a distance cr_{j} from the tree, where r_{j} is the radius of the circular surface (cross-section) of the tree at 1.30 m height from the ground basal area and c is a constant, which is suitably selected to achieve a desired sampling density (Gregoire & Valentine, 2007; Roesch, Green, & Scott, 1993). The probability of selecting trees, by this method, is proportional to their basal area. The Horvitz-Thompson estimator can be used for parameter estimations such as the total volume of the forest area (Horvitz & Thompson, 1952; Schreuder, Gregoire, & Wood, 1993).

The distribution of total estimates from sampling with probability proportional to size is unknown (Hájek, 1981), therefore estimating confidence intervals based on the normal distribution may not be accurate. In forestry, many sampling designs with probability proportional to size (prediction) have a small sample size, so arising the question: how much accurate and consistent confidence intervals can be estimated in these cases (Magnussen, 2001)? This is also happening for small-scale forest management several times, so for economic reasons non-large fixed-areas samples or Bitterlich sampling points are selected. The simple application of the bootstrap method gives reliable estimates of variance for all regression estimators that have been used as well as for the Horvitz-Thompson estimator of BS (Schreuder, Ouyang, & Williams, 1992). In the case of small sample sizes, the estimating confidence interval with bootstrap methods did not behave well (Schreuder & Williams, 2000). The nearest neighbor techniques; parametric, bootstrap and jackknife variance estimators produced comparable results (McRoberts, Magnussen, Tomppo, & Chirici, 2011). Recent research (Lyons, Keith, Phinn, Mason, & Elith, 2018) revolves that the resampling procedure provided accurate estimates of error for remote sensing classification and accuracy assessment. In general, there seem to be no results for confidence intervals evaluation with BS and bootstrap methods.

The purpose of the research is the evaluation of confidence intervals which have been created with Horvitz-Thompson estimator by applying the BS and utilizing bootstrap methods with small sample sizes. The results will be of great practical value because the data comes from a solid productive forest ecosystem.

In the next chapter, the BS is described somewhat more extensively, since the method is unknown in general, apart from those dealing with forest ecosystems. Additionally, methods of constructing and evaluating confidence intervals are given and the dataset acquisition is described. In chapter 3, the results are given and discussed while conclusions are drawn in the 4th chapter.

2. Methods and Data

The BS can be described in various ways (Eriksson, 1995). The application of the method can be done (De Vries, 1986; Overton & Stehman, 1995) as follows: In a simple random or systematic way, we place a sample of n points on the forest area, of which we want to estimate the characteristic Y. From each sample point, we aim all the trees at 1.30 meters height above the ground (breast height), projecting an angle to diameter by means of an instrument (e.g. relaskop), making a complete (360˚) rotation around the point. Trees, whose diameter at the breast height is greater than the angle α, are considered to be trees of the sample. If the diameter is equal to the projection of the angle α, there are ways in which it is judged whether these trees belong to the sample (De Vries, 1986; Kershaw Jr. et al., 2016). If y_{j} is the volume of the trunk of the j-th tree, then the volume of all the trees (M) of the forest area is given by the

$Y={\displaystyle \underset{j=1}{\overset{M}{\sum}}{y}_{j}}$ . (1)

The Horvitz-Thompson estimator of Y (De Vries, 1986; Schreuder et al., 1993) at the i-th sampling point is given by the following formula

$\stackrel{^}{Y}=FA{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}{y}_{ij}/{g}_{ij}}$ , (2)

where F is the criterion of tree selection (Basal Area Factor, BAF), A is the area of the forest,
${g}_{ij}=\left(\pi /4\right){d}_{ij}^{2}$ the tree basal area (the area of the cross-section at the breast height of the tree) of the j tree and m_{i} the number of trees selected in the sampling point i. The probability of selection,
${\pi}_{ij}={g}_{ij}/FA$ , depends on tree basal area of the tree and therefore larger in volume trees have a greater probability of being selected in the sample.

Although BS has many attractive features, the selected sample of trees at a single sampling point is a sample-group of adjacent trees, with consequence Y values being correlated (Overton & Stehman, 1995). Better estimates of the characteristics of the forest area are made by taking a number of n independent points. Then, the estimate of Y is given as

$\stackrel{^}{Y}={\displaystyle \underset{i=1}{\overset{n}{\sum}}{\stackrel{^}{Y}}_{i}/n}$ , (3)

where ${\stackrel{^}{Y}}_{i}$ with $i=1,2,\cdots ,n$ the estimate of Y at the point (Bitterlich unit) i with variance

$V\left(\stackrel{^}{Y}\right)=\left[{\displaystyle \underset{j=1}{\overset{M}{\sum}}{y}_{j}^{2}/{\pi}_{j}}+{\displaystyle \underset{j\ne {j}^{\prime}}{\overset{M}{\sum}}{y}_{j}{y}_{{j}^{\prime}}{\pi}_{j{j}^{\prime}}/\left({\pi}_{j}{\pi}_{{j}^{\prime}}\right)}-{Y}^{2}\right]/n$ , (4)

where ${\pi}_{jj}$ is the probability of both trees j and j΄ being included in the sample. An unbiased variance estimator (Palley & Horwitz, 1961; Schreuder et al., 1993) is given by the formula

$V\left(\stackrel{^}{Y}\right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}\left({\stackrel{^}{Y}}_{i}-\stackrel{^}{Y}\right)/n\left(n-1\right)}$ . (5)

The variance, as well as the $\stackrel{^}{Y}$ estimates, can be easily generated (Schreuder et al., 1993), either considering BS as a special case of sampling with a probability proportional to size, where the number of trees is a random variable (Palley & Horwitz, 1961) or considering it as a simple random sampling of the n from N clusters in the population (Schreuder, 1970).

Both a normal and two bootstrap confidence intervals were estimated (Efron, 1982; Efron & Tibshirani, 1993). The bootstrap intervals were calculated with the percentile method (C_{α}) and the bias-corrected and accelerated method (BC_{α}).

Assuming that $\stackrel{^}{Y}$ is normally distributed, a confidence interval for Y with coverage 1-α with α the level of significance is given as

$\left({\stackrel{^}{Y}}_{1o},{\stackrel{^}{Y}}_{up}\right)=\left(\stackrel{^}{Y}-{z}_{a/2}\stackrel{^}{se}\left(\stackrel{^}{Y}\right),\stackrel{^}{Y}+{z}_{a/2}\stackrel{^}{se}\left(\stackrel{^}{Y}\right)\right)$ , (6)

where z_{α}_{/2} is the value of the standard normal distribution and
$\stackrel{^}{se}(.)$ the estimated standard error.

The (1 − α) 100% confidence interval with the percentile method, C_{α}, is given by

$\left({\stackrel{^}{Y}}_{1o},{\stackrel{^}{Y}}_{up}\right)=\left({\stackrel{^}{Y}}^{\left(a/2\right)},{\stackrel{^}{Y}}^{\left(1-a/2\right)}\right)$ , (7)

where
${\stackrel{^}{Y}}^{\left(a/2\right)}$ and
${\stackrel{^}{Y}}^{\left(1-a/2\right)}$ the 100α/2 and 100(1-α/2) percentiles respectively of the bootstrap distribution. In the C_{α} interval, with BC_{α} a correction is made for bias and skewness. Thus, the corresponding interval with BC_{α} is estimated given by

$\left({\stackrel{^}{Y}}_{1o},{\stackrel{^}{Y}}_{up}\right)=\left({\stackrel{^}{Y}}^{\left({a}_{1}\right)},{\stackrel{^}{Y}}^{\left({a}_{2}\right)}\right)$ , (8)

where

${\alpha}_{1}=\Phi \left({\stackrel{^}{z}}_{0}+\frac{{\stackrel{^}{z}}_{0}+{z}_{\alpha /2}}{1-\stackrel{^}{\alpha}\left({\stackrel{^}{z}}_{0}+{z}_{\alpha /2}\right)}\right)$ (9)

${\alpha}_{2}=\Phi \left({\stackrel{^}{z}}_{0}+\frac{{\stackrel{^}{z}}_{0}+{z}_{1-\alpha /2}}{1-\stackrel{^}{\alpha}\left({\stackrel{^}{z}}_{0}+{z}_{1-\alpha /2}\right)}\right)$ . (10)

In the Equations (1) & (2), Φ (.) is the standard normal cumulative distribution function and ${\stackrel{^}{z}}_{0},\stackrel{^}{a}$ are the coefficients for bias and acceleration.

Finally, was calculated the percentage of confidence intervals covered by the Y parameter, the percentages miscoverage of Y on each side, the average width of the confidence intervals, as well as the coefficient of variation of the widths confidence intervals.

The data were obtained from the University Forest of Pertouli (39˚32'28''N 21˚27'57''E) in Greece (Stamatellos, 1991), which is almost entirely covered by hybrid fir (Abies x borisii-regis Mattf). The tree selection angle from sampling points was 2˚18' and F = 4 m^{2}∙ha^{−1} (Matis, 2004). A number of 203 random sampling units of BS were considered as population and samples of n = (10, 20, 30 and 40) were taken without replacement. The number of iterations was 5000 for the simulation and 1500 for the bootstrap resampling. The experiment was programmed with S-plus (Becker, Chambers, & Wilks, 1988; Venables & Ripley, 2000) and with R (James, Witten, Hastie, & Tibshirani, 2013; Robinson & Hamann, 2010; Team, 2013).

3. Results and Discussion

The results of the experiment are presented in Table 1. We note that for all confidence intervals the total coverage of the real population value increases correspondingly with the increase of the sample size. The opposite happens with the failure coverage percentages of the true value, which is constantly decreasing.

Table 1. Percentage (%) coverage failure of the true value Y, from the left (Lfcov) and from the right (Rfcov), % total coverage (Tcov), width of confidence intervals (Width) and widths coefficient of variation (CVwidth) as a function of the sample size (1 − a = 0.95).

*by NCI (Normal), Cα (percentile method) and BCα (the bias-corrected and accelerated method).

The widths of confidence intervals as well as their variability decrease as the sample size increases for all estimated confidence intervals. Thus, the simulation iteration (5000) and sampling bootstrap (1500) numbers appear to be sufficient, ensuring consistency for all the estimates. With sample size 10, the overall coverage of the normal confidence interval is better (92.8%) than the coverage of both bootstrap methods (90.95% and 91.65%), but the width (1730) of the normal interval is greater from the widths of the bootstrap methods (1653 and 1681). At sample sizes, 20, 30 and 40 are not being observed significant differences in the overall coverage of the three confidence intervals, with the BCα method having slightly better coverage rates. The same is true for the confidence intervals widths, but now they are slightly smaller in the normal confidence interval. The variability of the confidence intervals is approximately the same (15.40, 15.46, 15.83). The 95% nominal coverage approach appears to be between sample sizes 30 - 40 since in size 40 and the three confidence intervals it exceeds 95% nominal coverage. By comparing the two bootstrap methods, BCα has a slightly better coverage up to sample size 30 and, correspondingly, slightly larger widths in confidence intervals.

With t-approach, the probability of coverage reached 94.5% for the 10-sample size, but at the same time significantly increased the confidence interval width (21.35%). Thus z-approach was preferred in order to keep at the same level the confidence interval widths up to less than 5%. A research result of Zhou & Dinh (2005) for the mean of the sample shows that if $\stackrel{^}{\gamma}/\sqrt{n}<0.3$ , where $\stackrel{^}{\gamma}$ is the skewness of the sample, the confidence interval which is based on t-approximation is good enough. The study found $\stackrel{^}{\gamma}/\sqrt{n}<0.15$ where $\stackrel{^}{\gamma}=0.46$ and could be verified this result by considering BS as a simple random sampling n of N clusters of the population. The bootstrap confidence intervals were not well behaved for sample size 10, and this comes to an agreement with a relative result by Schreuder & Williams (2000) for small sample sizes, although for different variables of the forest stand.

4. Conclusion

In conclusion, all three methods of constructing confidence intervals, to a large extent, almost approximate the nominal coverage in sample size 30, while providing satisfactory coverage (>93%) in sample size 20. The normal confidence interval still has satisfactory coverage in the sample size 10, while for the same sample size, the bootstrap methods do not seem to perform well. The results came from a particular forest ecosystem with a clustered spatial distribution of trees and continuous management. However, it also needs research from other, different forest ecosystem structures in order to better evaluate the same confidence intervals, but also other types of confidence intervals suggested by the literature.

Acknowledgements

This research has been financially supported by General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (Scholarship Code: 1319).

References

[1] Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language: A Programming Environment for Data Analysis and Graphics. In Wadsworth and Brooks/Cole Advanced Books and Software. Berlin: Springer.

[2] De Vries, P. G. (1986). Sampling Theory for Forest Inventory: A Teach-Yourself Course. Berlin: Springer Science & Business Media.

https://doi.org/10.1007/978-3-642-71581-5

[3] Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans (Vol. 38). CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM.

https://doi.org/10.1137/1.9781611970319

[4] Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. London: CRC Press.

[5] Eriksson, M. (1995). Design-Based Approaches to Horizontal-Point-Sampling. Forest Science, 41, 890-907.

[6] Gregoire, T. G., & Valentine, H. T. (2007). Sampling Strategies for Natural Resources and the Environment. London: Chapman and Hall/CRC.

https://doi.org/10.1201/9780203498880

[7] Hájek, J. (1981). Sampling from a Finite Population (p. 247). New Yok: Marcel Dekker, Inc.

[8] Horvitz, D. G., & Thompson, D. J. (1952). A Generalization of Sampling without Replacement from a Finite Universe. Journal of the American Statistical Association, 47, 663-685. https://doi.org/10.1080/01621459.1952.10483446

[9] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, Heidelberg, Dordrecht, London: Springer.

https://doi.org/10.1007/978-1-4614-7138-7_1

[10] Kershaw Jr., J. A., Ducey, M. J., Beers, T. W., & Husch, B. (2016). Forest Mensuration (5th ed.). New York: John Wiley & Sons. https://doi.org/10.1002/9781118902028

[11] Lyons, M. B., Keith, D. A., Phinn, S. R., Mason, T. J., & Elith, J. (2018). A Comparison of Resampling Methods for Remote Sensing Classification and Accuracy Assessment. Remote Sensing of Environment, 208, 145-153.

https://doi.org/10.1016/j.rse.2018.02.026

[12] Magnussen, S. (2001). Saddlepoint Approximations for Statistical Inference of PPP Sample Estimates. Scandinavian Journal of Forest Research, 16, 180-192.

https://doi.org/10.1080/028275801300088288

[13] Matis, K. (2004). Forest Biometrics: II. Dendrometry (In Greek) (Vol. 2, 2nd ed.). Thessaloniki, Greece: Pegasus.

[14] McRoberts, R. E., Magnussen, S., Tomppo, E. O., & Chirici, G. (2011). Parametric, Bootstrap, and Jackknife Variance Estimators for the k-Nearest Neighbors Technique with Illustrations Using Forest Inventory and Satellite Image Data. Remote Sensing of Environment, 115, 3165-3174. https://doi.org/10.1016/j.rse.2011.07.002

[15] Overton, W. S., & Stehman, S. V. (1995). The Horvitz-Thompson Theorem as a Unifying Perspective for Probability Sampling: With Examples from Natural Resource Sampling. The American Statistician, 49, 261-268.

https://doi.org/10.1080/00031305.1995.10476160

[16] Palley, M. N., & Horwitz, L. G. (1961). Properties of Some Random and Systematic Point Sampling Estimators. Forest Science, 7, 52-65.

[17] Robinson, A. P., & Hamann, J. D. (2010). Forest Analytics with R: An Introduction. Berlin: Springer Science & Business Media.

https://doi.org/10.1007/978-1-4419-7762-5_1

[18] Roesch, F. A., Green, E. J., & Scott, C. T. (1993). An Alternative View of Forest Sampling. Survey Methodology, Statistics Canada, 19, 199-204

[19] Schreuder, H. T. (1970). Point Sampling Theory in the Framework of Equal-Probability Cluster Sampling. Forest Science, 16, 240-246.

[20] Schreuder, H. T., & Williams, M. S. (2000). Reliability of Confidence Intervals Calculated by Bootstrap and Classical Methods Using the FIA 1-HA Plot Design.

https://doi.org/10.2737/RMRS-GTR-57

[21] Schreuder, H. T., Gregoire, T. G., & Wood, G. B. (1993). Sampling Methods for Multiresource Forest Inventory. New York: John Wiley & Sons.

[22] Schreuder, H. T., Ouyang, Z., & Williams, M. (1992). Point-Poisson, Point-PPS, and Modified Point-PPS Sampling: Efficiency and Variance Estimation. Canadian Journal of Forest Research, 22, 1071-1078. https://doi.org/10.1139/x92-142

[23] Stamatellos, G. (1991). Research of Forest Volume Estimation Possibilities with Two-Stages Sampling Designs (In Greek, with English Summary). Doctoral Thesis, Thessaloniki: Aristotle University of Thessaloniki.

[24] Team, R. C. (2013). R: A Language and Environment for Statistical Computing.

[25] Venables, W., & Ripley, B. D. (2000). S Programming. Berlin: Springer Science & Business Media. https://doi.org/10.1007/978-0-387-21856-4

[26] Zhou, X. H., & Dinh, P. (2005). Nonparametric Confidence Intervals for the One- and Two-Sample Problems. Biostatistics, 6, 187-200.

https://doi.org/10.1093/biostatistics/kxi002