Seminal studies conducted at national and regional levels in sub-Saharan Africa, using nutrient balance approach, have indicated declining arable land quality with severe net nutrient losses of the order of 10 kg Nitrogen, 4 kg phosphates and 10 kg potash per hectare annually  with Kenya being one of the countries with net nutrient losses  . Empirical roots of nutrient balance studies are widely acknowledged  . However, opinions are divided on the extent and intensity of nutrient mining and variability; whether farmers’ achievements contradict nutrient depletion scenarios  ; whether levels of nutrient mining differ by agroecological zones and land use systems; whether underlying factors exist to explain direction and magnitude of nutrient balances  ; and how nutrient balances can be scaled-up.
One of the reasons why consensual accounts on nutrient balances remain intractable and illusive and at times anecdotal  is the limited use of rigorous statistical techniques to i) handle dependency in data and to reduce biases associated with variance estimates and inflation of Type I error   , ii) to quantify between study variability  and iii) to handle multiple outcomes/effect sizes simultaneously  . Nutrient balance studies are inherently associated with myriad challenges: inadequate systematic replication in space or in time  , dependencies in multiple outcomes, multicollinearity in independent variables, non-homogeneity in data, and missing values, and inadequate application of statistical techniques that can deal with nested or clustered data associated with such studies that use complex survey designs   . This study explores application of multivariate multilevel models in a meta-analysis of nutrient balances and thereby contributes to addressing the above challenges and controversies. Application of statistical procedures for meta-analysis was previously a domain of the health Sector but has recently been adopted in other disciplines  . Meta-analysis statistical techniques have a potential to address challenges and controversies by pooling and analyzing multiple studies together thereby improving statistical power and reducing the likelihood of type II error (failure to determine a difference that truly exist); increasing precision of estimates  ; relating outcome heterogeneity to explanatory covariates and factors, and identifying large scale-patterns even when obscured by local factors, thereby minimizing the danger of over-extrapolation from single context-based studies  .
Approaches to model estimation in meta-analysis vary widely. Descriptive analysis and paired t-test have been used in meta-analysis of nutrient balances drawn from 57 studies in Africa and concluded that there were positive soil nitrogen and potassium balances in some spots in Africa  while there was nutrient mining in others. Descriptive analysis has been used to identify drivers of tropical deforestation from 152 previous studies  , Ordinary Least Squares have been used in analysis of returns to agricultural research and development from 289 studies  , a binomial test has been used in meta-analysis of the differences in environmental impacts between organic and conventional farming based on 59 previous studies  and vote counting has been used in meta-analysis of agroforestry adoption based on 32 studies from 32 countries  . Classical meta-analysis that estimates model parameters in addition to within- and-between study variability (random effects model)   has also been used in a meta-analysis of the effects of woody and herbaceous legumes on maize yield in sub-Saharan Africa based on 94 studies from West, East and Southern Africa and concluded that inorganic fertilisers gave better maize grain yield response than legume trees and green manures, natural fallows and unfertilized maize in that order; and that “global maize yield response to legumes was significantly positive and higher than unfertilized maize and natural vegetation fallows”  .
Current methods of meta-analysis, however, have several limitations   : Inadequacies in modeling multiple outcomes simultaneously, in addressing dependencies in multiple outcomes (use incorrect standard errors), in dealing with non-linear correlations and non-homogeneity in data and in handling nested or clustered data  , yet these challenges characterise nutrient balance studies where response variables (outcomes) are often multivariate and have dependencies. Furthermore, methods such as vote counting and sign tests have been deplored due to their low power and the fact that they ignore sample size and effect magnitude  while descriptive statistics do not provide a framework to explore the effects of multiple covariates and factors on the dependent variables.
Possible approaches to modeling multiple outcomes of nutrient balances, taking into account the above challenges, include: multivariate fixed and random effects models, structural equation models, and multilevel models for modeling primary data among others  . Although the fore-mentioned methods offer a potential in meta-analysis, they have not been applied to nutrient balance studies. Further, the application of multivariate fixed and random effects models are constrained by limited availability of within-study correlation and variances for estimating the variance-covariance matrix required in the model when summary statistics are used in meta-analysis and when there are no individual participant data to estimate required variance-covariance matrix  . A “work- around” that has been proposed to estimate “missing” correlations in such situations include the use of estimates from similar published work, conducting sensitivity analyses for possible ranges of correlations and the use of Bayesian hierarchical models with vague priors in a Markov Chain Monte Carlo (MCMC) framework among others  .
In this study we demonstrate that multivariate multilevel models can be used in meta-analysis of farm nutrient balance data arising from complex surveys that involve multi-stage sampling, stratification and unequal sampling probabilities  . A previous meta-analysis of 54 studies using multilevel models, and through application of simulation studies for comparison, has shown that dependencies and heterogeneity at different model hierarchy can be effectively accounted for and that multiple outcomes can be modeled simultaneously using multivariate multilevel models  . Multilevel models also have a potential to estimate variances and standard errors correctly for nested data, model relationships between information at different levels of model hierarchies; and has ability to improve estimation and predictions and to analyse repeated measures data among others  .
This study uses individual participant data from multiple related cross-sectional surveys on nutrient balances from five different agro-ecological zones of Kenya to investigate the quality of arable land by estimating the magnitude and variability of Nitrogen, Phosphorus and Potassium (NPK) nutrient balances, assessing whether agro-climatic zones differ with respect to NPK nutrient balances and determining the effects of household resource endowments on NPK nutrient balances. To meet the research objectives, the study fitted a two-level multilevel model (multivariate multilevel model) with random intercept to farm nutrient balance data in a meta-analysis that used Iterative Generalised Least Squares (IGLS), an equivalent maximum likelihood method   , to estimate model parameters as described in the R-package R2MLwiN.
NUTrient MONitoring (NUTMON) data is used in this study. NUTMON is part of on-going research to investigate land quality and sustainability of smallholder farming systems in the tropics. The data used comprise 14 separate studies, from 5 research initiatives that used NUTMON methodology in different agro-climatic zones of Kenya. A single research initiative working in “n” agro-climatic zones was considered to have “n” separate studies (where n = number of studies). Studies which did not use multi-stage sampling to identify study participants were excluded from the analysis (on-farm and on-station experiments excluded).
The data comprised 349 observations (individual smallholder farm-households). About 42% and 25% of the smallholder farm-households in the dataset were from semi-humid to semi-arid (ACZ4), and semi-arid areas (ACZ5) of Kenya respectively. Farm households from humid (ACZ1), sub-humid (ACZ2) and semi-humid (ACZ3) areas accounted for 12%, 15% and 7% of total households in the dataset respectively. The arid (ACZ 6) and very arid (ACZ 7), with very low potential for plant production, were not represented in the dataset (Table 1).
The 349 observations have 3 dependent variables: N full balance (kg ha−1); P full balance (kg ha−1); and K full balance (kg ha−1) and 18 selected independent variables (factors/covariates). The latter were measured at two levels: 1) at level of individual farmers (household resource endowments); and 2) at agro-climatic zone level (Table 2).
2.2. General Analysis Methods
The study used the following general analysis methods:
1) Determined whether a two-level multi-level model with multiple outcome variables (multivariate multi-level model) is required for the NUTMON dataset.
Table 1. Study areas in Kenya and number of observations (farm-households).
Table 2. Factors and covariates used as independent variables in this study.
2) Based on (1) above, applied a two-level multi-level model (multivariate multi-level model) to:
Estimate an aggregate magnitude and variability of nutrient balances across agro-climatic zones that cover arable lands of Kenya;
Determine whether agro-climatic zones differ from each other in terms of NPK nutrient balances; and to
Identify the effects of household resource endowments on NPK nutrient balances.
2.3. Determining Necessity of a Two-Level Multi-Level Model (Multivariate Multilevel Model)
The study fitted a two-level multilevel model (multivariate multilevel model) without predictors (variance component model) to NUTMON dataset to determine whether multilevel modeling was needed at all for this dataset. Intra-class correlation Coefficient (ICC) and Design effect were calculated to aid in model output interpretation.
The multilevel equations for the variance component model were specified as follows:
Written in (mixed model) form by substitution of the level-2 equation into the level-1 equation, the model is:
= Individual response variable for ith farmer (level-1) in jth agro-climatic zone;
= Random intercept for jth agro-climatic zone (mean of all individual farmers in jth agroclimatic zone)
= Random intercept for all j agro-climatic zones (grand mean of all js)
= Residual effect (variation) for ith farmer around the mean of jth agroclimatic zone (random effect)
= Residual effect (variation) for jth agro-climatic zone around the grand mean (of all agro-climatic zones ie across all js)
; is the variance at individual farmer (level-1)
; is the variance at agro-climatic zone (level-2)
The study used Iterative Generalised Least Squares (IGLS) estimation algorithms in R2MLwiN package, to return estimates for random coefficients and their standard errors, estimates for deviance statistics and for variances and covariances for single and two level models (Table 3).
The study calculated Study Design Effect1 for the two level model as follows:
Table 3. Two-level multilevel variance component model compared with a single level model (fixed part of the model).
1Quantifies the effects of violating the assumption of independence on standard error estimates; Multiplier to be applied to standard errors to correct for negative bias that results from nested data.
= Average number of farmers per study; In this case (349/14) = 28.1
ICC = Intraclass correlation coefficient (at level 2); an estimate of proportion of variance at level-2
ICC at level-2 was estimated separately for Nitrogen, Phosphorus and Potassium farm nutrient balances using:
= Residual variance at level-1
= residual variance at level-2
= Total variance at level-2
The design effects for each nutrient balance were greater than 2.0 (Table 4). Previous analysis has shown that a design effect greater than 2.0 indicates the need for multilevel modeling  . Thus, the preliminary analysis indicates that multilevel modeling is appropriate for this nutrient balance dataset and a two-level multilevel model (multivariate multilevel) is suitable for this purpose; and is therefore applied in subsequent analyses in line with the objectives of this study.
2.4. Estimating Magnitude and Variability of Nutrient Balances
To estimate an aggregate magnitude and variability of nutrient balances across agro-climatic zones of Kenya, the study used Equation (1), describing a variance component model. Iterative Generalised Least Squares (IGLS) in R2MLwiN package used to quantify the parameters of the model, returned parameter estimates shown in (Table 3), Section 2.3.
Similarly, variability of nutrient balances (heterogeneity) at level-1 and at level- 2 model hierarchy were estimated using Variance Partitioning Coeffcient (VPC)/Intra-class correlation coeffcient (ICC) using Equation (2) (see explanation in Section 2.3):
; is the variance at individual farmer (level-1)
; is the variance at agro-climatic zone (level-2)
Table 4. Design effect for NPK nutrient balances.
= Average no. of farmers per study; ICC = Intraclass correlation Coeffcient.
2.5. Determining Whether Agro-Climatic Zones Differ from Each Other with Respect to NPK Nutrient Balances
To determine whether agro-climatic zones differ from each other with respect to nutrient balances, Equation (1) describing a variance component model was used. Parameter estimates were obtained in a similar way as in Section 2.3. The parameter estimates are presented in Table 3.
Further, in assessing whether agro-climatic zones differ from each other, the study determined whether the variance ( ) of the random component of the intercept in Equation (1), , was different from zero. A 95% confidence interval for the variance of was used to aid model output interpretation. Also, a likelihood ratio test was conducted by comparing the deviances of a model with and one without to assess whether (variance at level-2: agroclimatic zone) is significant. The null hypothesis for the latter was , so we do not need in the model (Ho: no agro-climatic zone variation or cluster effect exists and restricted or single model is “the true model”).
Natural log used in Likelihood ratio test:
The p-value associated with the Likelihood ratio (LR) test statistic was determined from Chi Square distribution (with 1 degree of freedom).
2.6. Effects of Household Resource Endowments on Nutrient Balances
The study fitted a two-level multilevel model (multivariate multilevel model) to NUTMON dataset to determine the effects of household resource endowments on NPK nutrient balances. The household resource endowments in Table 2 were added to the model as explanatory variables (in the fixed part of the model) and the intercepts at level-1 and level-2 model hierachy allowed to vary resulting in arandom intercept model. Slopes for the explanatory variables were not allowed to vary since they were fitted to the fixed part of the model only.
The multilevel equations for this model was specified as follows:
Level 1: , (for ; ; )
Level 2: ;
Written in mixed model form by substitution of the level-2 equations into the level-1 equation, the model is:
where at level-1:
= Individual response variable for ith farmer (level-1) in jth agro-climatic zone;
= Random intercept for jth agro-climatic zone (mean of all individual farmers in jth agroclimatic zone); each agro-climatic zone is assumed to have a different intercept coeffcient,
= A vector of k predictor variables for the ith farmer in jth agro-climatic zone
= A vector of k regression coeffcients associated with the predictor variables in jth agro-climatic zone:
= Residual effect (variation) for ith farmer in jth agro-climatic zone
where at level-2:
= Random intercept for all five agro-climatic zones (grand mean of all j groups); The ( )s’ are considered to vary randomly around a grand mean of all j groups ( ) at level-2
= A vector of n predictor variables measured at agro-climatic zone level (level-2 or j-level)
= A vector of n regression coefficients associated with the predictor variables at agro-climatic zone level (non-random coefficients):
= Residual effect (variation) for jth agro-climatic zone; ie the deviation of the intercept of jth agro-climatic zone from overall intercept of all agro-climatic zones (all js)
= A vector of k (fixed) regression coefficients indicating that the coefficients of Level-1 predictors ( ) do not vary across agro-climatic zone level (non-random slopes at level-2):
: all j values of are fixed (do not vary across agro-climatic zone) and are estimated as a single coefficient at level-2, for ;
3. Results and Discussion
3.1. Magnitude and Variability of Nutrient Balances
3.1.1. Magnitude and Direction of Nutrient Balances
The two-level multilevel model (multivariate multilevel model) without predictors (variance component model) fitted to the dataset returned the mean NPK nutrient balances, see (Table 3). The mean nitrogen nutrient balance of −11.9 kg ha−1 (with 95% confidence interval: −47.0, 23.2) tended to corroborate results of aggregate seminal studies that have reported negative (direction) nitrogen balances at national level  . This further confirms that arable land quality in Kenya is being degraded through declining farm nitrogen, though the observed figure was not statistically significant (confidence interval includes zero). The mean aggregate phosphorus (9.8 kg ha−1; p < 0.01; 95% Confidence Interval of 0.9, 18.8) and potassium balances (5.5 kg ha−1; 95% Confidence Interval of −6.9, 17.8) were however positive contrary to seminal aggregate studies that reported negative nutrient balances at national level  .
3.1.2. Variability of Nutrient Balances
The Variance Partitioning coefficients (adjusted Intra-class correlation coeffeicients) for NPK nutrient balances at different levels of model hierarchy are summarised in Table 5 while absolute values of variances and covariances are shown in Table 6.
For farm nitrogen nutrient balance, 48% of the variation lies between agro- climatic zones (between agro-climatic zone variability) while 52% of variation lie between farms. For each of the nutrient balances studied, a high proportion of total variation was from between-farm variability, 52%, 79% and 89% for nitrogen, phosphorus and potassium respectively.
Based on residual variances of each nutrient balance and covariances at level 1 (farm level; Table 3), the study observed high positive correlations between Nitrogen and phosphorus nutrient balances (r = 0.8), Nitrogen and potassium nutrient balances (r = 0.82) and moderate correlations between phosphorus and potassium nutrient balances (r = 0.68). These results imply a high dependence between effect sizes at level 1 of the study, dependence that cannot be ignored during analysis. Similarly at level 2, the study observed high dependence between variables as measured by correlations: Nitrogen-phosphorus (r = 0.75),
Table 5. Variance partitioning coefficient/Intra-class correlation coeffients for NPK nutrient balances.
Table 6. Two-level multilevel variance component model compared with a single level model (random part of the model).
var = Variance; Cov = Covariance
Nitrogen-potassium (r = 0.87) and Phosphorus-potassium (r = 0.92).
3.2. Agro-Climatic Zones and Nutrient Balances
The study assessed whether agro-climatic zones differ from each other, on average, with respect to farm nitrogen, phosphorus and potassium balances. This was explored preliminarily by looking at variance partitioning coefficient (VPC) and two tests i) assessing whether the variance of the random components of the intercept differ from zero and by ii) conducting likelihood ratio test.
Variance partitioning coefficient (VPC) measures the proportion of total variance which lies at the Agro-climatic zone level (level-2). Interpreted as VPC, 48%, 21%, and 11% of variation in nitrogen, phosphorus and potassium farm nutrient balances lie between agro-climatic zones respectively (Table 5; Table 6). This indicates that agro-climatic zones substantially differed with respect to NPK farm nutrient balances.
Suppose Agro-climatic zones were to differ only slightly or not at all, then the agro-climatic zone j values of (Equation (1)) should differ little from each other and or exhibit low-to-no variance. However, a 95% confidence levels for random part of the model (Table 6) indicated that variances for farm nitrogen (95% CI: 1005.33, 7919.74) and phosphorus (95% CI: 69.8, 2281.36) were significantly different from zero (Table 6). This indicates that there were significant differences between agro-climatic zones with respect to nutrient balances.
The study further used likelihood ratio test to tringulate the observation above as variances are known to have positively skewed sampling distributions while 95% confidence intervals assume assymptotic normal sampling distribution and may not be reliable. A likelihood ratio test done by comparing a model with agroclimatic zone effects (with ) and one without , to assess whether (variance at level-2: agroclimatic zone) is significant returned:
The p-value associated with the Likelihood (LR) test statistic (Chi Square value of 196.2) with 1 degree of freedom is 0.0001. Since the p-value is very small, we reject the null hypothesis (Ho: no agro-climatic zone variation or cluster effect exists and restricted or single model is “the true model”) and conclude that a gro-climatic zone variation exists and is significant . This further confirms that there were significant differences between agro-climatic zones with respect to farm nutrient balances.
3.3. Household Resource Endowments and Nutrient Balances
A two-level multilevel model (multivariate multilevel model) fitted to the dataset (see Section 4) to test the hypothesis: All household resource endowments do not have an effect on the magnitude of full N, P and K nutrient balances returned results shown in Table 7.
The observation that more than one household resource endowment variable has an effect on NPK nutrient balances provides a strong evidence against the null hypothesis (e.g. Value of livestock (0.0005 kg N ha−1, p < 0.001); cropping family labour (−0.05101kg K ha−1, p < 0.01), Table 7. We thus reject the null hypothesis and conclude that at least one household resource endowment has an effect on the magnitude of full N, P and K farm nutrient balances (Table 7).
A negative relationship between family labour for cropping and NPK nutrient balances was observed (Table 7) with cropping family labour having a signify-
Table 7. Effects of household resource endowments on NPK nutrient balances.
****p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05; *”p < 0.1
cant effect (negative) on potassium balance. A unit change in cropping family labour lowering potassium nutrient balance by 0.05101 kg K ha−1 (p < 0.01). Although smallholders in Kenya rely heavily on family labour to manage their farms  this labour input may be for multiple purposes and not necessarily for strategic farm nutrient management alone.
Contrary to Cropping family labour, hired labour for redistribution units (LABHIRUC) had a positive effect on the direction of NPK nutrient balances. It significantly predicted phosphorus (P) balances with a unit change in LABHIRUC resulting in a change of 0.023 (p < 0.01) units in phosphorus balances.
Average slope of land, though a biophysical factor, was considered a proxy to land endowment resource quality. Farmers’ management practices and prices farmers are willing to offer for a given piece of land tend to differ depending on slope percentage, perceived degradation and ease of management attributed to slope effect. The study observed a negative correlation between average slope (%) of land and NPK nutrient balances and that average slope was significantly and negatively correlated with nitrogen balances.
The study observed mixed results with regards to effects of household resource capital on nutrient balances. While the effects of “total capital owned” significantly lowered NPK nutrient balances, livestock-related capital (value of livestock) had significant positive effect (Table 7). Thus, the study has indicated that it is the type of capital (e.g. livestock) owned and not the volume and total value of household capital that may be important in determining nutrient balances, though previous studies indicate that resource-rich farmers have a high chance of returning positive nutrient balances in their farms should they employ nutrient adding, recycling and conserving technologies and practices  .
4. Conclusions and Recommendations
Based on a two-level multilevel (multivariate multilevel) model fitted to the nutrient balance dataset, this study has shown that farm nitrogen mining is taking place and is putting the quality of arable land in Kenya at stake. However, and contrary to on-going narratives on blanket existence of widespread nutrient mining in Kenya, evidences from this study indicate that farm phosphorus and potassium balances are not always negative.
Agro-climatic zones are characterised by different biophysical potentials that may influence farm nutrient balances to different degrees. The study draws the conclusion that farm nitrogen, phosphorus and potassium balances do differ between agro-climatic zones classified as arable land in Kenya. For example, variances for farm nitrogen and phosphorus were significantly different from zero across agro-climatic zones. The same was corroborated by likelihood ratio test. This serves to indicate the necessity of designating agro-climatic zone specific nutrient management interventions to address declining quality of arable land rather than the use of blanket intervention approaches.
Household resource endowments and resource flow and allocation patterns have a potential to influence farm nutrient balances. This study explored the effects of household resource endowments on nutrient balances in arable land. The study concludes that livestock household resource endowments is an important determinant of nutrient balances at smallholder farm level, thus recommends improvement of livestock practices at farm level not only to improve on farm nutrient balances but also to increase farm-profitability. However, it is further noted that ownership of large volumes of capital (total value of capital) and family labour resources do not automatically translate into positive effects on farm nutrient balances, but rather it is the type of capital owned (e.g. livestock) and what use it has been put to that matters.
The study generated interesting results and demonstrated that multivariate multilevel models can be used to conduct meta-analysis of farm nutrient balances and to explore arable land quality despite the small sample size. Future studies with large sample sizes and a large pool of relevant factors and covariates are, however, required to further give higher order insights beyond this study. This can be reinforced by meta-analysis that focuses on summary statistics and the use of simulation modeling to summarise inferences by random numbers rather than by point estimates and standard errors.
Sincere thanks to Dr. Edgar Otumba of Maseno University for his encouragement and to smallholder farmers who participated in research activities leading to generation of data used in this study.