The end of the 50-year internal armed conflict in Colombia, resulting from the peace agreement signed between the Colombian Government and the Revolutionary Armed Forces of Colombia, is expected to lead to a substantial social-economic transformation. Increased private and public investments are expected to boost the domestic economy. Timberland investments have received considerable attention due to land availability, relatively high rates of forest growth, and the strategic geographical location of Colombia (Cubbage et al., 2010; Cubbage et al., 2007; López et al., 2010; Mendell et al., 2006) . Antioquia is the Colombian state with the largest area in forest plantations with 69000 ha, and a region characterized by a high timber production potential, an increasing demand for timber, and a high economic development. These potential timberland investments require appropriate growth and yield modeling, which provides valuable inputs for performing financial analysis and risk assessment (Hall & Bailey, 2001; Louw, 1999; Moser & Hall, 1969) .
Forest growth is a function of four factors (Clutter et al., 1992) : 1) the stand age, 2) the site quality or environmental effect (Bravo-Oviedo et al., 2007; Louw & Scholes, 2006) ; 3) the intrinsic characteristics of a forest stand, such as density and tree genetics (Calegario et al., 2005) ; and 4) management practices such as mechanical site preparation, fertilization, and competing vegetation control (Allen et al., 2005) . The effect of age on forest growth is accounted by for the inclusion of the stand age in the growth and yield models. Moreover, one of the intrinsic characteristics, density, is accounted by for the planting density, survival/mortality model or stand density.
Much research has been devoted to site quality since it represents a measure of the base timber production potential of a stand. The determination of the site quality is particularly important in Colombia, where intensive forest management practices such as mid-rotation cultural treatments, sustained fertilization and vegetation control, and planting genetically improved seedlings have not been extensively applied. Site quality can be analyzed by either direct or indirect methods.
The direct method to determine forest productivity requires measuring plots in stands representing distinct ages and site qualities, and estimating a site index equation based on dominant heights (Burkhart et al., 2018; Clutter et al., 1992; Walters et al., 1989) . Nevertheless, this approach has some shortcomings: the existence of forest plantations is required to assess site productivity; estimations are locally restricted, and the actual effect of the environment on forest production and productivity is not analyzed. Moreover, the assumption of independence between planting density and dominant height cannot be satisfied (Antón-Fernández et al., 2011; Henskens et al., 2001; Land et al., 2004) , thereby providing biased estimations of forest productivity. Virtually all existing forest growth and yield (FG&Y) models for P. patula in Colombia utilize the site index as their primary input, precluding yield estimations on a regional scale.
On the other hand, the indirect method or geocentric approach (Weiskittel et al., 2011) of forest productivity is intended to estimate growth and yield models as a function of the site quality factors: climate, soil, and topographic conditions (Bravo-Oviedo et al., 2007; Clutter et al., 1992; Günter et al., 2009; Louw & Scholes, 2002, 2006; Lugo et al., 1988; Palahi et al., 2004; Vanclay, 1992; Walters et al., 1989; Wang et al., 2005; Zapata-Cuartas, 2007) .
Models based on an explicit biological explanation of FG&Y are more appropriate than empirical models (Vanclay, 1994) . The von Bertalanffy-Chapman-Richards (vBCR) model has a well-grounded biological basis with its mathematical terms implicitly expressing the underlying physiological processes occurring in an organism. The vBCR model has been extensively examined worldwide for many species (Buford & Burkhart, 1987; Fekedulegn et al., 1999; Moser & Hall, 1969; Pienaar & Turnbull, 1973; Pienaar, 1979; Zeide, 1993) , as well as in Colombia for Acacia mangium (Torres & del Valle, 2007) , Eucalyptus sp. (Zapata-Cuartas, 2007) , Pinus patula (Restrepo et al., 2012) , and Tectona grandis (Restrepo & Orrego, 2015; Restrepo et al., 2012; Torres et al., 2012) . Likewise, the vBCR parameters have been expressed as a function of covariates (Bravo-Oviedo et al., 2007; Hall & Bailey, 2001; Louw & Scholes, 2006) .
P. patula was chosen to characterize the timberland investment potential because this species is the most widely planted in Colombia (Endo, 1994) , as well as in Antioquia where the planted area is 20000 ha (DANE, 2004) . The natural range of P. patula is the south-central region of Mexico (Webb et al., 1984) . However, P. patula has been planted worldwide in a wider climatic range than its natural habitat (van Zonneveld et al., 2009) . Biophysical covariates associated with the three forest growth factors have been utilized to explain P. patula FG&Y variation (Evans, 1974; Grey, 1979; Louw & Scholes, 2006) .
This research had three main objectives. First, to explore a large set of covariates, and to select a subset that potentially explain P. patula growth and yield variation in Antioquia, Colombia. Second, to estimate and validate a P. patula FG&Y model as a function of stand age, stand density, and selected biophysical variables using the vBCR model. Our hypothesis is that the stand age and density, along with biophysical covariates representing site quality factors will explain P. patula growth and yield in Antioquia, Colombia. Third, to spatially represent P. patula FG&Y in Antioquia, Colombia. The resulting map will be useful to provide insight into potential areas for timberland investment in the region.
The database of forest measurements comprises 1119 1/40 ha-circular-geo-referenced temporary plots measured in unthinned, unmanaged, and genetically unimproved P. patula plantations in eight locations in Antioquia, Colombia (Table 1). Data correspond to forest inventories conducted by the Department of Forest Sciences at National University of Colombia, Medellin, in forest plantations of Empresas Públicas de Medellin, and by students conducting a Forest Engineering thesis in forest plantations of Cipreses de Colombia (Lopera & Gutiérrez, 2000) . Raster maps for climatic variables and a digital elevation model were obtained from WorldClim, version 1.4 (Hijmans et al., 2005) , and the Shuttle Radar Topography Mission (Farr et al., 2007) , respectively. Soil data, including the pH level, were obtained in vector format from national public databases in Colombia (IGAC, 2007) (Table 1). Additional covariates were calculated with the collected dataset. The slope of the terrain (S, degrees) was calculated from the digital elevation model using the d-eight algorithm as implemented in the R function terrain (Hijmans et al., 2016) using a 30 m grid size DEM. The temperature to precipitation ratio (TP, 100˚C∙mm−1) was calculated as:
where Amt is the annual mean temperature (˚C), and Amp is the annual mean precipitation (mm), and the value of 100 is a scale factor.
2.2. Model Description
The integrated form of the vBCR model can be written as:
where y is the yield volume (m3∙ha−1), f is the asymptote (m3∙ha−1), A is the stand age (years), β is the intrinsic growth rate, and γ is the allometric constant. The parameters f and β represent the potential environmental effects on growth and yield since they are related to metabolic outcomes of resource availability (Brown & Lugo, 1982; González, 1994; Lugo et al., 1988; Pienaar, 1979) . The parameter γ represents, theoretically, a characteristic growth attribute for a species (González, 1994; Pienaar & Turnbull, 1973) and, therefore, it is not related to environmental variables. The parameter γ, generally found in the range [0,2], adds flexibility to the vBCR model (Richards, 1959) . Thus, a value of γ greater than 1 corresponds to a logistic shape for the yield curve; when γ is less than 1, it corresponds to the Mitscherlich curve; when γ is equal to 1, the model takes the name of Gompertz function (Pienaar & Turnbull, 1973; Zapata-Cuartas, 2007) . Additionally, the γ parameter is related to the curve’s inflection point and to the age at which the maximum current annual increment is attained, thereby influencing both the biological and economical rotation ages.
2.3. Model Estimation and Validation
Equation (2) was estimated using generalized nonlinear least-squares with the R function gnls with the variance function argument as varPower (Pinheiro et al., 2017) . Different parameter linear specifications as a function of age, stand
Table 1. Summary statistics of the 1119 temporary plots of P. patula and environmental covariates in Antioquia, Colombia.
density and biophysical covariates were evaluated. The first estimated model expressed the volume as a function of the parameters f, β, γ, and age. Out of this formulation, nested models were specified by expressing each parameter as a function of an intercept, the stand density, and one or more biophysical covariates. The same biophysical covariate was not considered in more than one parameter to avoid potential multicollinearity.
The statistical contribution of one additional covariate in the nested models was evaluated using the likelihood ratio (LR) test (Wackerly et al., 2008) :
where θ represents the vector of parameters, i.e. f, β, γ; and w and Ω represent the parameter space of the models with p − 1 and p parameters, respectively. The estimated model containing p parameters was compared to the previous model containing p − 1 parameters. For a large number of observations, −2 log(LR) has approximately a distribution. The rejection region is defined as , with for a level of significance .
The validation was performed using an independent 133-observation dataset of forest plantations (ICA, 2015) with similar biophysical attributes and stand characteristics of the measured plots. Specifically, the hypothesis tested was:
where µo is the mean of observed volume and µf is the mean of fitted values, estimated by their corresponding sample means , and , respectively. The test statistics is:
where and are the sample variances of the observed volumes and fitted values, respectively. Hence, the null hypothesis of equal means is rejected if , with t(0.975, 264) = 1.969 for a level of significance .
2.4. Forest Growth, Yield Curves and Yield Map
Parameters f and β were estimated for each of the 1119 plots based on the covariate values associated with each location. The mean and the 95% prediction interval (2.5% and 97.5% percentiles of the distribution) for these two parameters were estimated to depict the mean and the 95% prediction interval of the growth and yield. The parameter γ was fixed in all these calculations because it does not vary across the study area or as a function of the stand density. Similarly, parameters f and β were represented as a function of the mean of covariates and were used to generate a map of forest yield at year 20. Natural protected areas, which include lands above 2800 meters above sea level (m.a.s.l.), were not considered in the analyses and maps (Gómez-Ossa, 2011) , given the environmental restrictions by which forest plantations are not allowed as land use on protected areas.
3.1. Biophysical Covariates Description and Variable Selection
The measured plots in Antioquia represent climatic conditions of Amt, Amp, and TP at (mean ± SD) 17.5˚C ± 1.0˚C, 2508 ± 681.2 mm, 0.74 ± 0.18 100˚C∙mm−1, respectively. Soil and topography were characterized by the mean terrain slope at 14.7 ± 6.6 degrees, and soil pH mainly between 5.1 and 6. Most biophysical covariates exhibited a bell-shape relationship with parameters f and β, providing empirical evidence of optimal environmental conditions for P. patula in Antioquia. For instance, parameters f and β exhibited their maximum values for Amt and Amp at 17.5˚C and 2200 - 2500 mm∙yr−1, respectively (Figure 1 and Figure 2). Similar bell-shape curves, showing the maximum values for f and β, were exhibited for other biophysical covariates, e.g. moister quarter’s precipitation (Mqp), coolest month’s temperature (Cmt), and moister month’s precipitation (Mmp) (Figure 1 and Figure 2).
Points in Figure 1 and Figure 2 are assumed to represent the average contribution of a covariate to the parameters f and β within one of the eight locations. Similarly, the lines show the general trend (a loess smoother) of the relationship of the covariable and parameters f and β (§8.2.2 Using covariates with nlme in Pinheiro & Bates, (2000) ). Thus, a line with a noteworthy trend and relative high gradient (how steep the line is) makes the covariate a good candidate to be selected as potential explanatory variable. Both the soil pH (difference between pH levels) and terrain’s slope (steeper than 15˚) contribute with more than 100 m3∙ha−1 to the asymptote (f) (Figure 1). TP has a noticeable trend with high gradient, suggesting that the higher the TP the higher the intrinsic growth rate (b) (Figure 2).
Overall, pH, S, and TP capture the effect of both f and β, and for that reason they were chosen over the remaining covariates. These variables were plotted in the first row of Figure 1 and Figure 2, along with stand density (N), to help the reader to identify described relationships with the parameters. Moreover, the selected biophysical covariates can be easily determined by practitioners and researchers from field measurements or publicly available data without much need of additional external inputs or further complex calculations.
3.2. Yield Model
In the best estimated model, the parameter f was a function of an intercept, pH, S, and N; the parameter β was a function of an intercept, and TP; and the parameter γ was a function of just an intercept (Table 2, model 20). This model had the maximum value of the log likelihood (Table 2), and its parameter estimates were highly statistical significant (p-value < 0.001) (Table 3). The parameter estimates can be written as (Table 3):
where represents an indicator function that takes the value of one if the soil pH is in the range 5.1 - 6, or zero otherwise (in this case the remaining option is soil pH between 4.1 and 5).
The asymptote f is expressed as a function of an intercept, equal to 236 m3∙ha−1, and the effect of two covariates, pH and S. On sites where pH was in the range 5.1 - 6, the asymptote was lower in 112.4 m3∙ha−1 compared to sites where pH was in the range 4.1 - 5. Moreover, each additional degree of slope added 2.2
Figure 1. Relationship between the parameter f with environmental covariates and stand density: Amp, annual mean precipitation (mm∙yr−1); Amt, annual mean temperature (˚C); TP, temperature to precipitation ratio (100˚C∙mm−1); pH, soil pH class; S, slope of the terrain (degrees); N, stand density (trees per ha); Dqp, driest quarter’s precipitation (mm∙Qtr−1); Mqp, moister quarter’s precipitation (mm∙Qtr−1); Hmt, hottest month’s temperature (˚C); Cmt, coolest month’s temperature (˚C); Hqt, hottest quarter’s temperature (˚C); Cqt, coolest quarter’s temperature (˚C); D, soil drainage class (W: well; E: excessive); Dp, soil depth class (M: moderate, <80 cm, D: deep, >80 cm); F, soil fertility class (V: very low; L: low); E, elevation (meters above sea level, m.a.s.l.); Dmp, driest month’s precipitation (mm∙mo−1); and Mmp, moister month’s precipitation (mm∙mo−1).
m3∙ha-1 to the asymptote. Regarding density, for each 100 standing TPH, the asymptote increases in 18 m3∙ha−1. The parameter β was expressed as a function of an intercept, equal to −0.123, and TP. As described, the higher the TP, the higher the value of . Evidence was not found to reject the null hypothesis that the mean of observed and estimated volumes of the validation dataset were different (p-value > 0.05).
The average growth and yield curves of P. patula by soil pH are shown in b. Figure 3(a) depicts the average yield with parameter estimates of = 432.578 and = 0.266; and = 312.624 and = 0.292 for locations where soil pH is in the range 4.1 - 5 and 5.1 - 6, respectively. The parameter estimate was constant for both soil pH factor levels. The estimated yield curves are flat over
Figure 2. Relationship between the parameter β with environmental covariates and stand density: Amp, annual mean precipitation (mm∙yr−1); Amt, annual mean temperature (˚C); TP, temperature to precipitation ratio (100˚C∙mm−1); pH, soil pH class; S, slope of the terrain (degrees); N, stand density (trees per ha); Dqp, driest quarter’s precipitation (mm∙Qtr−1); Mqp, moister quarter’s precipitation (mm∙Qtr−1); Hmt, hottest month’s temperature (˚C); Cmt, coolest month’s temperature (˚C); Hqt, hottest quarter’s temperature (˚C); Cqt, coolest quarter’s temperature (˚C); D, soil drainage class (W: well; E: excessive); Dp, soil depth class (M: moderate, <80 cm, D: deep, >80 cm); F, soil fertility class (V: very low; L: low); E, elevation (meters above sea level, m.a.s.l.); Dmp, driest month’s precipitation (mm∙mo−1); and Mmp, moister month’s precipitation (mm∙mo−1). Values of the vertical axis in 100.000 units.
the first four years, increasing steadily until year 25. The average mean annual increment (MAI) reaches a maximum of 22.7 and 18.0 m3∙ha−1∙year−1 at years 15 and 13 for pH 4.1 - 5 and 5.1 - 6, respectively (Figure 3(b)). The shading region in each panel of Figure 3 represents the 95% prediction interval (PI) by the two pH factor levels, describing the growth and yield variability associated with the environmental covariates considered. For example, the MAI’s 95% PI was (11.21, 43.27), and (5.28, 38.84) m3∙ha−1∙year−1 for soil pH 4.1 - 5, and 5.1 - 6, respectively. Overall, the 95% PI of biological rotation age was in the range of 9 - 30 years (Figure 3(b)).
The spatial representation of the parameters f and β, and the yield is presented in Figure 4. The map of the parameter f indicates that the most productive areas
Table 2. Estimated yield models of P. patula in Antioquia, Colombia.
int: intercept; LR: likelihood ratio; soil pH (pH, dummy variable with value of one for pH in the range 5.1 - 6, and value of zero for pH in the range 4.1 - 5), slope (S, degrees), temperature to precipitation ratio (TP, 100˚C∙mm−1), and stand density (N, trees per hectare (TPH)).
Table 3. Parameter estimates for the yield model of P. patula.
int: intercept; soil pH (pH, dummy variable with value of one for pH in the range 5.1 - 6, and value of zero for pH in the range 4.1 - 5), slope (S, degrees), temperature to precipitation ratio (TP, 100˚C∙mm−1), and stand density (N, trees per hectare(TPH)).
Figure 3. Growth and yield of Pinus patula in Antioquia, Colombia. (a) volume yield (m3∙ha−1) as a function of age by soil pH; (b) mean annual increment (m3∙ha−1∙yr−1) as a function of age soil pH. The solid and dashed lines correspond to the average curve for pH 4.1 - 5 and 5.1 - 6, respectively, and the shading region represents their 95% prediction intervals.
are located in the northeastern region of Antioquia (Figure 4(a)). High values of the parameter β are observed in the central region of Antioquia (Figure 4(b)). At age 20 years, the spatial representation suggests that the yield is high in some areas of the northeastern region of Antioquia (Figure 4(c)). The total suitable land for planting P. patula in Antioquia is 10,600 km2.
Biophysical conditions that contribute with the optimal growth of P. patula in Antioquia are similar to those found in the natural range of the species: annual mean temperature in the range of 12˚C - 18˚C, mean maximum temperature of the hottest month in the range of 20˚C - 29˚C, mean minimum temperature of the coldest month in the range of 6˚C - 12˚C, and annual mean precipitation in the range of 750 - 2000 mm∙year−1 (Gillespie, 1992; Webb et al., 1984) . We explored the effect of a wide array of biophysical covariates on P. patula growth and yield. Forest yield models expressing stand volume of P. patula as a function of environmental covariates were not found in previous studies. Virtually all existing FG&Y models that include the effect of biophysical covariates utilize dominant height or site index as dependent variables. Site index has been expressed as a function of the relative distance between the summit and the plantation site, elevation, slope, soil quality (Evans, 1974) , parent material, soil pH, effective soil depth (Grey, 1979) , nitrogen mineralization, topographic position and soil classification (Louw & Scholes, 2006) . Moreover, the main variables of a climate envelope model for P. patula were, in order, annual mean temperature,
Figure 4. Spatial representation of the estimated parameters of the yield model of P. patula in the altitudinal range of 1800 - 2800 m.a.s.l. and after excluding protected areas: (a) Asymptote (m3∙ha−1) or parameter f in the estimated model; (b) Intrinsic growth rate or parameter β in the estimated model; (c) Estimated yield (m3∙ha−1) at year 20.
the maximum temperature in the warmest month, annual precipitation, precipitation in the driest month, temperature seasonality, and annual temperature range (Leibing et al., 2013; van Zonneveld et al., 2009) . The driest quarter’s precipitation has been found as a relevant variable to explain P. patula growth as well (Louw & Scholes, 2006) .
Data used to estimate the FG&Y model corresponded to unthinned, unmanaged and genetically unimproved forest plantations, and characterized P. patula plantations in Antioquia and other regions of Colombia as well. The vBCR model can be estimated using data from temporary plots representing contrasting environmental conditions (Pienaar & Turnbull, 1973) , like the data used in this research. The asymptote (f) and intrinsic growth rate (β) parameters in the vBCR have a well-known relationship with environmental covariates (Louw, 1999; Louw & Scholes, 2002, 2006) .
Out of the biophysical covariates explored, we selected soil pH, terrain slope, the temperature to precipitation ratio because they exhibited a distinct relationship with parameters f and β. We considered TP as a reasonable explanatory variable for P. patula growth, as studies on tropical forest growth suggest that total biomass is explained by TP (Brown & Lugo, 1982) . This index reflects the water availability to plants due to potential evapotranspiration that is proportional to air temperature (Lugo et al., 1988) . Likewise, since TP can easily be calculated using Amt and Amp, it represents a good index to be used by practitioners and investors compared to other measures like the water deficit index. Regarding soil pH and slope as soil and topographic factors, respectively, they are found to influence forest production and their corresponding asymptote in yield models (Evans, 1974; Grey, 1979; Louw & Scholes, 2006) . These biophysical covariates can be easily obtained from field measurements, facilitating the forest productivity evaluation. The finding that no covariate was related to the parameter γ is supported by the theory that the allometric constant is directly related to the way in which a specific species can grow and survive (Pienaar & Turnbull, 1973) .
Density is usually assumed to be more related to the intrinsic growth rate than to the asymptote (Pienaar & Turnbull, 1973) . However, stand density was found to be an explanatory variable related to the asymptote. In this case, the relationship between stand density and the asymptote may be the result of the effect of contrasting environmental conditions on the stand survival rates.
The estimated asymptotes in this study are in the range of maximum volume values previously reported for P. patula in Latin America: 360 m3∙ha−1 at year 48 in Colombia (Ramírez et al., 2014) , and 180 - 350 m3∙ha−1 at year 40 in Mexico (Santiago-García et al., 2014) ; but less than the maximum volume values reported in Africa: 550 m3∙ha−1 at year 31 in Ethiopia (Lemenih et al., 2004) ; and 900 ± 50 m3∙ha−1 at year 41 in Angola (Delgado-Matas & Pukkala, 2012) . However, these maximum values of volume correspond to average potential timber production and seem to be related to one location or one small area. That is the case of a volume value of 843 m3∙ha−1 at year 25 contained in our dataset (Table 1).
The 95% prediction intervals of the forest yield and MAI suggest a high variability in timber production for P. patula plantations in Antioquia, Colombia. The range of the maximum MAI and its associated biological rotation are wider than previously reported (Table 4). The observed variability of the growth and yield of P. patula leads to considerable uncertainty, thereby having significant implications for financial profitability and risk of timberland investments (Restrepo et al., 2012) .
The estimated model can be considered as a baseline for analyzing timber production in Antioquia, Colombia, by using data from unthinned, unmanaged, and genetically unimproved P. patula plantations. Hence, the potential timber
Table 4. Mean annual increment and biological optimal rotation for P. patula.
aMaximum MAI and biological rotation, b95% prediction interval; cIt is a conversion of mean annual biomass increment of 14 ton∙ha−1∙yr−1 to mean annual increment using a value of density of 400 kg∙m−3.
production is likely to be higher under intensive forest management practices. The estimated model can be used to estimate and predict the yield of P. patula in other Colombian regions with environmental conditions similar to those represented in this research. However, it is strongly recommended to use forest inventory data and independent forest yield models to validate those estimations.
A broad range of environmental conditions was used to estimate the vBCR model type for FG&Y. The modeling approach explicitly includes biophysical covariates, and contributes to the identification of the main drivers of P. patula growth in Antioquia, Colombia. The asymptote (f) was estimated as a function of soil pH, slope of the terrain and stand density. The intrinsic growth rate (β) was estimated as a function of the temperature to precipitation ratio. The latter is a very interesting and promising, ecological-theoretical index to be used in FG&Y studies regardless of the species. Moreover, since the model includes climate variables, it can be used to examine potential effects of future climatic conditions on FG&Y. Although more biophysical covariates representing site quality factors were examined, they were not included in the model because they may explain the same variability as selected covariates. Moreover, selected biophysical covariates can be obtained easily from field measurements taken by practitioners, when they conduct technical feasibility studies of timberland investments. The allometric constant (γ) was not related to environmental covariates because it reflects the species’ autoecology of growth. Lower and upper bounds of the 95% prediction intervals for growth and yield were determined as a function of the parameters f and β. The results suggest that FG&Y of P. patula is highly variable, which has significant implications for financial profitability and risk evaluation in forest plantations. The estimated parameters f and β, and yield of P. patula were represented in maps to identify the most suitable areas where P. patula can be planted in Antioquia. The estimated model, growth and yield curves, and forest yield maps provide valuable information for timber investment analysis. Future research should address other important drivers of FG&Y such as genetics and silvicultural treatments to determine volume gains associated with intensive forest management practices.
This research was funded by the Project “Determinantes Espacialmente Explícitos de Transiciones en Coberturas Terrestres con Significativo Impacto para la Provisión de Servicios Ecosistémicos”, which is part of the research program “Gestión del Riesgo por Cambio Ambiental en Cuencas Hidrográficas”, funded by Colciencias. We sincerely thank to Cesar Urrego, Álvaro Lema, Rodolfo Parra, Victor Gutiérrez, Gabriel Lopera, Ingrid Mazo, Juan David Arango, and Astrid Gil for their invaluable contribution to this research. We also would like to thank the foresters, technicians and people who were directly responsible for field measurements which were used to consolidate the dataset for this research.