Global tourism has experienced continued growth, accompanied with deepening diversification, to become one of the fastest growing economic sectors in the world, contributing 10.2% of global GDP in 2016. In Kenya, it contributed 9.8% of Kenya’s GDP in 2016, accounting for 9.2% of total employment in Kenya  . Tourism in Kenya is mainly wildlife-based, with the big five large mammals (elephant, rhino, buffalo, lion and leopard) being the main source of tourism satisfaction and revenue  . However, tourist’s preference on the big five has led to the management policies being skewed towards their conservation, leading to under-appreciation of biodiversity. This tendency has led to heightened efforts to conserve the big-five, minimizing efforts to conserve other species  . Among the species under threat is the zebra species, of which Grevy’s and Plains zebra are the focus of this study.
Grevy’s zebra is a zebra species found in Kenya and Ethiopia. It has been assessed as endangered under criterion A2acd  . Grevy’s zebra population was approximately 2350 in Kenya in the year 2016  . Factors identified that have led to declining population include habitat degradation, competition from livestock and local hunting  . Plains Zebra is listed as Near Threatened and it is close to qualifying for vulnerable under criterion A2a + 3c + 4ac  . Plains zebra population is estimated at 98.820 in 2016  . Given the status of both zebra species, it is the focus of this study to see how similar they are based on their ecological niche.
Attempts to model both Plains and Grevy’s zebra niche have been few. Studies have majorly focused on enumeration of zebra population counts  . Kigen et al. modeled the potential current and future (2080) distribution of Grevy’s zebra in northern Kenya using a climate envelope model  . Kebede et al. modeled the seasonal habitat distribution of Grevy’s zebra in Alledeghi Wildlife Reserve (Ethiopia) with a view of developing conservation strategies  . However, inter-species niche comparison studies based on occupied geographic space have been few. One notable study focused on niche overlap of mountain hare subspecies in Europe vis a vis the invasion by the European hare  .
In order to foster knowledge regarding the conservation of Grevy’s and Plains zebras and to provide insights that can be used for developing conservation efforts, this study has formulated two objectives: 1) To model and evaluate the ecological niche of both Grevy’s and Plains zebras and 2) To assess the level of niche similarity based on niche similarity indices.
2. Data and Methods
2.1. Study Area
Laikipia County is one of the 47 counties in the Republic of Kenya (Figure 1). It lies between latitudes 0˚18'' and 0˚51'' North and between longitude 36˚11'' and 37˚24' East. It covers an area of 9462 km2. The county mainly consists of a plateau bordered by the Great Rift Valley to the West, the Aberdare to the South and Mt. Kenya massifs to the South East. The annual average rainfall varies between 400 mm and 750 mm though higher annual rainfall totals are observed on the areas bordering the slopes of Mt. Kenya and the Aberdare Ranges. The county has a gazetted forest area of 580 Km2 comprising of both the indigenous and plantation forests. The county population is estimated at 479,072 as of 2017  . Laikipia county was chosen as the study area as it is where majority of the Grevy’s zebra are located in Kenya.
Figure 1. Location of Laikipia county and neighbouring counties. Inset map denotes Kenya and the county boundaries, with Laikipia County highlighted in brown.
2.2. Data Collection
Data used in this study was classified into 3 categories: Environmental data (rainfall & temperature), Satellite Imagery (i.e. Landsat images to derive land use/land cover maps) & field survey data (zebra & cattle occurrence data). A summary of the data products used is shown in Table 1.
Rainfall and Temperature were adopted from Worldclim database, based on Kigen et al. (2003) work  , where they modelled Grevy’s zebra in Northern Kenya using variables from Worldclim database. NDVI, LULC, cattle and population data acted as surrogate datasets, which represented the major causes that have affected the distribution and population of both zebra species  . NDVI represented the vegetation health hence areas where zebras could graze; LULC delineated the areas zebras are likely to be found based on known land cover and zebra occurrences; cattle represented the major competitor for zebras as they are both herbivores; and population represented the areas of human settlement.
Since the datasets had different spatial resolutions, all the variables were standardized to a resolution of 1 km using different resampling strategies. For
Table 1. Data used and respective data sources.
NDVI, an average window was used; for LULC, a majority window was used to depict the dominant land cover; and for population, a summation window was used to aggregate the total population within a 1 km grid.
2.3. Methodology Summary
A summarized workflow is shown in Figure 2. The first step is the interpolation of cattle data into cattle raster map. Predictor variables are then extracted at the zebra occurrence locations for each individual specie. The data is then split into a training and test dataset.
Testing or validation is required to assess the predictive performance of the model. Ideally an independent data set should be used for testing the model performance. However, in many cases this will not be available, a situation particularly prevalent in threatened and endangered species. Therefore, the most commonly used approach is to partition the data randomly into ‘training’ and “test” sets, thus creating quasi-independent data for model testing  .
Variable selection is first done using multi-collinearity analysis on bioclimatic variables, and then jack-knife tests on the remaining variables. This was done so as to reduce model over-parameterization. Habitat distribution modeling was then undertaken based on the selected variables. What followed was an accuracy assessment that was done using the Receiver Operating Characteristics. If the result is acceptable, then niche similarity indices are computed. Detailed explanations of the analysis done are given in the subsequent sections.
2.4. Cattle Raster Interpolation
Cattle occurrence data was first interpolated using Inverse Distance Weighting (IDW) in order to develop a cattle occurrence raster map. IDW uses the measured values surrounding the prediction location to predict a value for the unmeasured location. IDW assumes that each measured point has a local influence that diminishes with distance. IDW makes the assumption that the value at the
Figure 2. Flowchart methodology.
unsampled location is the weighted average of known values within the neighborhood  . IDW formulas are outlines as
where means the unknown cattle data; means the cattle estimate at known locations; N means the amount of known cattle locations; means the weighting of each cattle location; means the distance from each known cattle location to the unknown site; α means the power, and is also a control parameter, generally assumed as two  . Due to the difference in spatial resolution, all datasets were resampled to a resolution of 1 km so as to maintain spatial homogeneity.
2.5. Multi-Collinearity Analysis
Multi-collinearity refers to the pair-wise correlation among predictor variables based on the Pearson correlation coefficients. It affects the approximations of regression coefficients and induces bias responses between outputs and predictor variables  .
Pearson correlation coefficients measure the strength and direction of the linear relationship between two variables, describing the direction and degree to which one variable is linearly related to another. It assumes that the variables are well approximated by normal distribution, and their joint distribution is bivariate normal  . It is denoted as follows
where is the value of the measured inhibitory activity for compound i (i = 1, 2, ×××, 67) is the average of the measured inhibitory activity, is the value of the estimated inhibitory activity for compound i, and is the average of the estimated inhibitory activity.
The Pearson correlation coefficient can take values from −1 to +1. A value of +1 shows that the variables are perfectly linear related by an increasing relationship; a value of −1 shows that the variables are perfectly linear related by a decreasing relationship; and a value of 0 shows that the variables are not linearly related by each other. There is a strong correlation between variables if the correlation coefficient is greater than 0.8 and a weak correlation if the correlation coefficient is less than 0.5  .
Given multiple variables where the relationships between pairs of variables are many, a correlation coefficient matrix or a correlation ellipse can be developed. A correlation coefficient matrix is a matrix denoting the numerical pairwise correlation coefficient of each variable, organized in a matrix. A correlation ellipse plot is a visual plot that encodes the correlation coefficient into an ellipse, each having a different color code based on the strength of the correlation. The direction of the correlation (i.e. whether direct or inverse relationship) is also shown based on the direction of the semi-major axis  . Using correlation ellipses instead of correlation matrices allows for quicker understanding of underlying trends in the datasets.
2.6. Jack-Knife Tests
Jack-knife is an approach that excludes one variable at a time when running the model. In so doing, it provides information on the performance of each variable in the model in terms of how important each variable is at explaining the species distribution and how much unique information each variable provides. This can point out highly correlated variables, thereby allowing the user to determine if percent contribution values are likely to be skewed due to these correlations  .
A model is created each time a variable is omitted from the model run in turn. Another model is also created using each variable alone. At the same time a model with all the variables in that Species Distribution Model (SDM) is also created. The Area under the Curve (AUC) of each model is recorded and all the values plotted together in the Jackknife. The Jack-knife tests thus shows the output of the AUC of the model with the following: All the variables; without one variable; and with only one individual variable in isolation. Comparing the 3 values gives an indication of the importance of each variable in predicting the species  .
2.7. Habitat Distribution Modeling
Species Distribution Models (SDMs) have been employed widely in generating habitat distribution maps. Maxent is one of the Species Distribution Models that has been applied widely in habitat distribution modelling. Maxent was chosen for this study because it is freely available, and has been proven to be one of the best performing modelling even with a relatively small number of samples   .
Maxent uses the principle of maximum entropy. Entropy is the measure of uncertainty associated with a random variable. The greater the entropy, the greater the uncertainty. Adhering to these concepts, Maxent utilizes presence-only points of occurrence, avoiding absence data and evading assumptions on the range of a given species  . It estimates the probability of occurrence of the species of interest using presence only species data and a group of environmental variables  . To obtain a solution, Maxent maximizes the gain function, a penalized maximum likelihood function. Exponentiating the gain function gives the likelihood ratio of an average presence to an average background point, hence maximizing the gain corresponds to finding a model that can best differentiate presences from background locations  .
where the first term denotes sum of predicted values at presence locations, second term denotes sum of predicted values at background locations, and the third term an overfitting penalty, β is a regularization coefficient, and is the variance of feature j at presence locations.
Maxent gives output data in the following formats: raw; cumulative; and logistic. The logistic format is recommended given that it provides estimates of the probability of occurrence as predicted by the included environmental variables  .
2.8. Accuracy Assessment
The Area under Curve (AUC) curve is a widely used statistical technique for assessing accuracy of predictive models. An AUC plot is obtained by plotting the fraction of correctly classified cases on the y axis (sensitivity or true positive rate) against the fraction of wrongly classified cases (1-specificity or false positive rate) for all possible thresholds on the x axis at different threshold  . A random prediction will result in an AUC value of 0.5 whereas a perfect prediction assumes the maximum possible AUC of 1.0  ; AUC > 0.75 are considered as suitable for conservation planning  . AUC has been increasingly used in the evaluation of models of species distributions and is regarded as one of the best methods of assessments, with the primary advantage of providing a single measure of model performance that is independent of any particular choice of threshold  . The true and false positive rates are assessed at the species occurrence locations for each individual species.
It is important to note that Maxent computes a variation of AUC, based on the problem of classifying presence vs. background points (which may not be true absences)  . However, since both zebra species are threatened to varying degrees and have dwindling numbers, the background points could be considered as pseudo-absences of species occurrence.
2.9. Niche Similarity
Conservation of species, both flora and fauna, has effects in both the ecological and evolutionary aspects. This can be assessed using SDMs. However, intra and inter-species interactions in ecological and geographical space is not as straight-forward, and methodologies for conducting niche similarity are not as developed as SDMs. Niche similarity is based on a test which asks whether SDMs from sister species predict one another’s known occurrence better than expected under the null hypothesis (that they provide absolutely no information about another’s range)  . Warren et al. (2008) proposed several niche similarity metrics that may be used to test niche similarity scenarios quantitatively.
Schoener (1968) statistic for niche overlap, ranges from 0 (niche models have no overlap) to 1 (niche models identical)  .
This metric is simple, has a long history of use, and permits direct comparison to traditional measures of niche similarity that focus on microhabitat and/or diet  .
Another statistic used is based on Hellinger distances  , defined as
Hellinger distances lie between 0 and 2. The similarity metrics have previously been used in ecological studies, primarily in comparing community composition across sites  . To compare Hellinger-based results to more conventional ecological measures of niche overlap, Warren et al. (2008) proposed a similarity statistic
which ranges from 0 (no overlap) to 1 (niche models identical).
3. Results and Discussions
3.1. Correlation of Bioclimatic Variables
In order to reduce the potential of model over-parameterization, multi-collinearity analysis was conducted on the 19 bioclimatic variables. Variables with an intercorrelation higher than 0.8 (r > 0.8) were eliminated. To enable identify bioclimatic variables that have an inter-correlation higher than the set threshold, a correlation ellipse plot was developed, with the red dots identifying variables with a high level of pairwise correlation i.e. r > 0.8, hence one of the variables was eliminated. 10 of the 19 bioclimatic variables were found to be least correlated (below the set threshold). The correlation ellipse plot was developed using R programming for all the bioclimatic variables under consideration (See Figure 3).
3.2. Jack-Knife Tests
Jack knife tests were conducted after correlation, using 10 of the 19 bioclimatic variables (which were least correlated) together with the following data sets: land use/land cover; population; NDVI; and cattle occurrence raster. When the test is run, variables contributing least to the model fit (<2%) were removed. The remaining variables were then used for habitat distribution modelling for both Grevy’s and Plains zebra.
Figure 4 shows the contribution of each variable to the overall model for the Plains and Grevy’s zebra respectively. From the graphs, population, cattle, Bio19 and NDVI explain 77.5% of Plains zebra distribution. For Grevy’s zebra, cattle, Bio4, Bio1 and population explain 80% of Grevy’s zebra distribution. Bio3, Bio14, Bio1, Bio4, Bio13, Bio12 and Bio18 were thus eliminated for Plain’s zebra
Figure 3. Correlation ellipse plot (red dots indicate r > 0.8).
Figure 4. Variable importance for Plains and Grevy’s zebra respectively.
distribution modelling, while NDVI, Bio18, Bio14, Bio3, LULC, and Bio13 were eliminated for Grevy’s distribution modelling.
3.3. Habitat Distribution Maps
Figure 5 shows the predictive probability occurrence maps of Plains and Grevy zebras based on the species occurrence and environmental variables. The logistic output was used so as to produce probability predictive maps, based on the maximized gain function.
The map shows that the core presence of both species is mainly in the middle section of the county, stretching from North to South. It indicates both species are located in the open woodlands and away from human habitats, though they face competition from cattle grazing. The concentration on the north is common to both species as these areas have a large concentration of wildlife conservancies and game reserves, hence human activity is regulated.
3.4. Accuracy Assessment
The habitat prediction maps were assessed based on area under curve (AUC) from a Receiver Operating Curve (ROC) plot. AUC > 0.75 was the threshold set for acceptance, as set out by Lobo et al.  . The ROC plots are a by-product of the Maxent software once it produces the habitat distribution maps (See Figure 6).
For Plains zebra, the AUC for training and test data was 0.82 and 0.79 respectively, while for Grevy’s zebra was 0.9 and 0.77 respectively. The jagged curves for Grevy’s zebra are as a result of few points available (66 compared to 594 for Plains zebra).
3.5. Niche Similarity
Schoener’s D statistic and Warren’s I statistic were both computed. Both showed significant levels of niche overlap as both ,
Figure 5. Habitat distribution maps for Grevy’s and Plains zebra respectively.
Figure 6. AUC curves for Plains and Grevy’s zebra respectively (red curves are derived from training data, blue curves are derived from the test data).
indicating that the sister zebra species are identical based on their occupied niche environments. This suggests that similar conversation strategies can be adopted for both species.
This study conducts habitat distribution modelling and niche overlap, which is key for identifying if conservation efforts used for one species can be largely adopted in the conservation of another, especially endangered species. By considering bioclimatic variables, cattle data, land use and population metrics; a habitat distribution map of both Plains and Grevy’s zebra was developed; indicating concentration of both species around the centre of the county; running along a north-south direction to varying extents. This was validated using AUC plots, which indicated a training accuracy of 0.82 and 0.9 for Plains and Grevy’s zebra respectively. Niche similarity tests were then undertaken and found significant niche similarity based on the two metrics for the two species (i.e. ). This indicates that Grevy’s zebra largely share the same ecological environment with the Plains zebra, which is significant in an effort to boost the endangered species numbers.
The author wishes to thank Kenya Wildlife Service for their assistance and cooperation while executing this study.
 Parker, G., Sundaresan, S. and Chege, G. (2011) Using Sample Aerial Surveys to Estimate the Abundance of the Endangered Grevy’s Zebra in Northern Kenya. African Journal of Ecology, 49, 56-61.
 Kebede, F., Bekele, A., Moehlman, P.D. and Evangelista, P.H. (2012) Endangered Grevy’s Zebra in the Alledeghi Wildlife Reserve, Ethiopia: Species Distribution Modeling for the Determination of Optimum Habitat. Endangered Species Reserach, 17, 237-244.
 Caravaggi, A., et al. (2017) Niche Overlap of Mountain Hare Subspecies and the Vulnerability of Their Ranges to Invasion by the European Hare; the (Bad) Luck of the Irish. Biological Invasions, 19, 655-674.
 Kumar, S. and Stohlgren, T.J. (2009) Maxent Modeling for Predicting Suitable Habitat for Threatened and Endangered Tree Canacomyrica monticola in New Caledonia. Journal of Ecology and Natural Environment, 1, 94-98.
 Chen, F.-W. and Liu, C.-W. (2012) Estimation of the Spatial Rainfall Distribution Using Inverse Distance Weighting (IDW) in the Middle of Taiwan. Paddy and Water Environment, 10, 209-222.
 Dormann, C.F., et al. (2013) Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance. Ecography, 36, 27-46.
 Bolboaca, S.-D. and Jantschi, L. (2006) Pearson versus Spearman, Kendall’s Tau Correlation Analysis on Structure-Activity Relationships of Biologic Active Compounds. Leonardo Journal of Science, No. 9, 179-200.
 Navarro-Cerrillo, R.M., Hernández-Bermejo, J.E. and Hernández-Clemente, R. (2011) Evaluating Models to Assess the Distribution of Buxus balearica in Southern Spain. Applied Vegetation Science, 14, 256-267.
 Morales, N.S., Fernández, I.C., Carrasco, B. and Orchard, C. (2015) Combining Niche Modelling, Land-Use Change, and Genetic Information to Assess the Conservation Status of Pouteria splendens Populations in Central Chile. International Journal of Ecology, 2015, Article ID: 612194.
 Phillips, S.J., Anderson, R.P. and Schapire, R.E. (2006) Maximum Entropy Modeling of Species Geographic Distributions. Ecological Modelling, 190, 231-259.
 Merow, C., Smith, M.J. and Silander, J.A. (2013) A Practical Guide to MaxEnt for Modeling Species’ Distributions: What It Does, and Why Inputs and Settings Matter. Ecography, 36, 1058-1069.
 Fielding, A.H. and Bell, J.F. (1997) A Review of Methods for the Assessment of Prediction Errors in Conservation Presence/Absence Models. Environmental Conservation, 24, 38-49.
 Lobo, J.M., Jiménez-Valverde, A. and Real, R. (2008) AUC: A Misleading Measure of the Performance of Predictive Distribution Models. Global Ecology and Biogeography, 17, 145-151.
 Warren, D.L., Glor, R.E. and Turelli, M. (2008) Environmental Niche Equivalency versus Conservatism: Quantitative Approaches to Niche Evolution. Evolution, 62, 2868-2883.