Estimation of Bounded Populations and Carrying Capacity with the Logistic Model

Show more

1. Introduction

Sample surveys are widely used as a cost effective apparatus of data collection and for making valid inference about population parameters. Government bureaus and organizations use such methods to obtain the current information. The foremost aim of a statistician in a sample survey is to obtain information about the population by deriving reliable estimates of unknown population parameters.

This study is using estimation techniques to estimate the bounded population and carrying capacity called the Logistic model that do not require any choice of step size as in the case of local polynomial regression estimator or have to be restricted a fix behavior, instead we allow the data to reveal its nature. The logistic model is use for data fitting. The logistic equation was introduced (around 1840) by the Belgian mathematician and demographer P.F. Verhulst as a possible model for human population growth [1] .

Under simple random sampling (SRS) without replacement design, [2] proposed an exactly unbiased estimator for θ_{yx}. The proposed estimator is given by

${\stackrel{^}{\theta}}_{HR}={\stackrel{\xaf}{r}}_{s}+\frac{n\left(N-1\right)}{N\left(n-1\right){\stackrel{\xaf}{x}}_{u}}\left({\stackrel{\xaf}{y}}_{s}-{\stackrel{\xaf}{r}}_{s}{\stackrel{\xaf}{x}}_{s}\right)$ (1.1)

where, ${\stackrel{\xaf}{y}}_{s}={\displaystyle {\sum}_{i\in s}\frac{{y}_{i}}{n}}$ , ${\stackrel{\xaf}{r}}_{s}={\displaystyle {\sum}_{i\in s}\frac{{r}_{i}}{n}}$ , ${r}_{i}=\frac{{y}_{i}}{{x}_{i}}$ , ${\stackrel{\xaf}{x}}_{s}={\displaystyle {\sum}_{i\in s}\frac{{x}_{i}}{n}}$ , ${\stackrel{\xaf}{x}}_{u}=\frac{{t}_{x}}{N}$ , the population ratio ${\theta}_{yx}=\frac{{t}_{y}}{{t}_{x}}$ , where ${t}_{y}={\displaystyle {\sum}_{i\in U}{y}_{i}}$ be the population total for the variable

Y, ${t}_{x}={\displaystyle {\sum}_{i\in U}{x}_{i}}$ be the population total for the variable X and U of N units indexed by the set $\left\{1,2,\cdots ,N\right\}$ a finite population. This estimator can be rewritten under general sampling design p(・). In this case, this estimator is no longer unbiased but still with negligible bias [3] .

Under general sampling design, [4] proposed an estimator for estimating the population ratio θ_{yx}. This estimator, has negligible relative bias especially for small sample sizes

${\stackrel{^}{\theta}}_{JM}={\stackrel{\xaf}{r}}_{s}+\frac{1}{{\stackrel{\xaf}{x}}_{s}}\left({\stackrel{\xaf}{y}}_{s}-{\stackrel{\xaf}{r}}_{s}{\stackrel{\xaf}{x}}_{s}\right)$ (1.2)

Define π_{i}, the first order inclusion probability, by

${\pi}_{i}={P}_{r}\left({i}^{\text{th}}\text{\hspace{0.17em}}\text{element}\text{\hspace{0.17em}}\in \text{\hspace{0.17em}}s\right)={\displaystyle {\sum}_{i,j\in s}P\left(s\right)}$ (1.3)

For $i\ne j$ , the second order inclusion probability is defined by

${\pi}_{ij}={P}_{r}\left({i}^{\text{th}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}{j}^{\text{th}}\text{\hspace{0.17em}}\text{element}s\text{\hspace{0.17em}}\in \text{\hspace{0.17em}}s\right)={\displaystyle {\sum}_{i,j\in s}P\left(s\right)}$ (1.4)

The [5] estimator of the population total ${t}_{y}={\displaystyle {\sum}_{i\in U}{y}_{i}}$ is defined by

${\stackrel{^}{t}}_{y\pi}={\displaystyle {\sum}_{i\in U}{y}_{i}\frac{{I}_{\left\{i\in s\right\}}}{{\pi}_{i}}}$ (1.5)

where ${I}_{\left\{i\in s\right\}}$ is one if $i\in s$ and zero otherwise. Further,

${\stackrel{\xaf}{y}}_{s}=\frac{1}{N}{\stackrel{^}{t}}_{y\pi}$ (1.6)

can be used to estimate the population mean ${\stackrel{\xaf}{y}}_{u}=\frac{1}{N}{t}_{y}$ . It can be noted that

${\stackrel{^}{t}}_{y\pi}$ and
${\stackrel{\xaf}{y}}_{s}$ are unbiased estimators for t_{y}, and
${\stackrel{\xaf}{y}}_{u}$ respectively. However,
${\stackrel{^}{t}}_{y\pi}$ and
${\stackrel{\xaf}{y}}_{s}$ do not use the availability of auxiliary variables in the study. In similar way,

${\stackrel{\xaf}{x}}_{s}=\frac{1}{N}{\stackrel{^}{t}}_{x\pi}$ and ${\stackrel{\xaf}{r}}_{s}=\frac{1}{N}{\stackrel{^}{t}}_{r\pi}$ (1.7)

are unbiased estimators for ${\stackrel{\xaf}{x}}_{u}$ and ${\stackrel{\xaf}{r}}_{u}$ respectively. Where ${\stackrel{\xaf}{x}}_{s}$ is the sample mean of the inclusion probability of the auxiliary variable.

The availability of more than one auxiliary variable issued in literature for estimating the finite population total t_{y}, or finite population mean
${\stackrel{\xaf}{y}}_{u}$ .

Under SRS, [6] was the first one who deals with the problem of estimating the population mean using more than one auxiliary variables. His estimator is given by

${\stackrel{^}{\stackrel{\xaf}{y}}}_{u}={\displaystyle {\sum}_{i=1}^{P}{w}_{i}{\stackrel{\xaf}{x}}_{iu}{\stackrel{^}{\theta}}_{yx}}$ (1.8)

where p is the number of the auxiliary variables,
${\stackrel{^}{\theta}}_{y{x}_{i}}=\frac{{\stackrel{\xaf}{y}}_{s}}{{\stackrel{\xaf}{x}}_{is}}$ w_{i} is the weight of the ith auxiliary variable such that
${\sum}_{i=1}^{P}{w}_{i}}=1$
${\stackrel{\xaf}{y}}_{s}$ is the sample mean of Y and

${\stackrel{\xaf}{x}}_{iu},{\stackrel{\xaf}{x}}_{is}$ are the population mean and the sample mean of X_{i}, respectively, for
$i=\text{1},\cdots ,p$ . [7] proposed the following estimator

${\stackrel{^}{\stackrel{\xaf}{y}}}_{u}={\stackrel{\xaf}{y}}_{s}\left({w}_{1}\frac{{\stackrel{\xaf}{x}}_{1u}}{{\stackrel{\xaf}{x}}_{1s}}+{w}_{2}\frac{{\stackrel{\xaf}{x}}_{2u}}{{\stackrel{\xaf}{x}}_{2s}}\right)$ (1.9)

for estimating the population mean ${\stackrel{\xaf}{y}}_{u}$ , ${w}_{\text{1}}+{w}_{\text{2}}=\text{1}$ .

[8] studied the general form of (1.9). They proposed two classes of estimators using two auxiliary variables to estimate the population mean for the variable of interest Y.

[9] suggested a new multivariate ratio estimator using the regression estimator instead of ${\stackrel{\xaf}{y}}_{s}$ which used in (1.9). Their estimator is given by

${\stackrel{\xaf}{y}}_{pr}={\displaystyle {\sum}_{i=1}^{2}{w}_{i}\frac{{\stackrel{\xaf}{y}}_{s}+{b}_{i}\left({\stackrel{\xaf}{x}}_{iu}-{\stackrel{\xaf}{x}}_{is}\right)}{{\stackrel{\xaf}{x}}_{is}}{\stackrel{\xaf}{x}}_{iu}}$ (1.20)

where b_{i}, i = 1,2 are the regression coefficients. Based on the mean squares error (MSE), they found that their estimator is more efficient than (1.9) when

$MSE\left({\stackrel{\xaf}{y}}_{pr}\right)<MSE\left({\stackrel{\xaf}{y}}_{u}\right)$ ,

where $MSE\left({\stackrel{\xaf}{y}}_{pr}\right)$ , and $MSE\left({\stackrel{\xaf}{y}}_{u}\right)$ are defined by Equations (2.4), and (1.2) of Kadilar and Cingi (2004), respectively.

In subsection 2.1 we introduced a general population model that accommodates birth and death rates that are necessarily constant, while subsection 2.2 talked about the asymptotic properties and Section 3.1 talked about the empirical studies. Finally, Section 4.0 drew a conclusion on the study. However, our population P(t) will be a continuous approximation to the actual population, which of course changes only by integral increments―that is, by one birth or death at a time.

Suppose that the population changes only by the occurrence of births and deaths―there is no immigration or emigration from outside the country or environment under consideration. It is customary to track the growth or decline of a population in terms of its birth rate and death rate functions defined as follows:

B(t) is the number of births per unit of population per unit of time at time t;

D(t) is the number of deaths that occur during the time at time t.

Then the numbers of births and deaths that occur during the time interval $\left[t,t+\Delta t\right]$ is given (approximately) by:

Births: $B\left(t\right)\cdot P\left(t\right)\cdot \Delta t$ , Deaths: $D\left(t\right)\cdot P\left(t\right)\cdot \Delta t$

Hence the change $\Delta P$ in the population during the time interval $\left[t,t+\Delta t\right]$ of length $\Delta t$ is

$\Delta P=\left\{\text{births}\right\}-\left\{\text{deaths}\right\}\approx B\left(t\right)\cdot P\left(t\right)\cdot \Delta t-D\left(t\right)\cdot P\left(t\right)\cdot \Delta t$ (1.21)

So

$\frac{\Delta P}{\Delta t}\approx \left[B\left(t\right)-D\left(t\right)\right]P\left(t\right)$ (1.22)

The error in this approximation should approach zero as $\Delta t\to 0$ , so―taking the limit―we get the differential equation

$\frac{\text{d}P}{\text{d}t}=\left(B-D\right)P$ (1.23)

in which we write $B=B\left(t\right)$ , $D=D\left(t\right)$ , and $P=P\left(t\right)$ for brevity. Equation (1.22) is the general population equation. If B and D are constants, Equation (1.22) reduces to the natural growth equation with $K=B-D$ . But it also includes the possibility that B and D are variable functions of t. The birth and death rates need not be known in advance; they may well depend on the unknown function $P\left(t\right)$ .

2. Estimation of Bounded Population and Carrying Capacity

This section is purposely considering an estimator that is the logistic model estimate of the bounded population and carrying capacity.

2.1. Proposed Logistic Model

Suppose the birth rate B is a linear decreasing function of the population size P, so that $B={B}_{0}-{B}_{1}P$ , where ${B}_{0}$ and ${B}_{1}$ are positive constants. If the death rate $D={D}_{0}$ remains constant, then Equation (1.22) takes the form

$\frac{\text{d}P}{\text{d}t}=\left({B}_{0}-{B}_{1}P-{D}_{0}\right)P$ (1.24)

That is,

$\frac{\text{d}P}{\text{d}t}=aP-b{P}^{2}$ (1.25)

where $a={B}_{0}-{D}_{0}$ and $b={B}_{1}$

If the coefficients a and b are both positive, then Equation (1.25) is called the logistic equation. For the purpose of relating the behavior of the population $P\left(t\right)$ to the values of the parameters in the equation, it is useful to rewrite the logistic equation in the form

$\frac{\text{d}P}{\text{d}t}=KP\left(M-P\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}P\left(0\right)={P}_{0}$ (1.26)

where $K=b$ and $M=\frac{a}{b}$ are constants. Solving Equation (1.25) gives,

$P\left(t\right)=\frac{M{P}_{0}}{{P}_{0}+\left(M-{P}_{0}\right){\text{e}}^{-KMt}}$ (1.27)

Actual human populations are positive valued. If ${P}_{0}=M$ , then (1.27) reduces to the unchanging (constant- valued) “equilibrium population” $P\left(t\right)\equiv M$ . Otherwise, the behavior of a logistic population depends on whether $0<{P}_{0}<M$ or ${P}_{0}>M$ . If $0<{P}_{0}<M$ , then we see from (1.26) and (1.27) that ${P}^{\prime}>0$ and

$P\left(t\right)=\frac{M{P}_{0}}{{P}_{0}+\left(M-{P}_{0}\right){\text{e}}^{-KMt}}=\frac{M{P}_{0}}{{P}_{0}+\left\{\text{postivenumber}\right\}}<\frac{M{P}_{0}}{{P}_{0}}=M.$

However, if ${P}_{0}>M$ , then we see from (1.26) and (1.27) that ${P}^{\prime}<0$ and

$P\left(t\right)=\frac{M{P}_{0}}{{P}_{0}+\left(M-{P}_{0}\right){\text{e}}^{-KMt}}=\frac{M{P}_{0}}{{P}_{0}+\left\{\text{negativenumber}\right\}}>\frac{M{P}_{0}}{{P}_{0}}=M.$

In either case, the “positive number” or “negative number” in the denominator has absolute value less than ${P}_{0}$ an―because of the exponential factor- approaches 0 as $t\to +\infty $ . It follows that

${\mathrm{lim}}_{t\to \infty}P\left(t\right)=\frac{M{P}_{0}}{{P}_{0}+0}=M$ (1.28)

Thus a population that satisfies the logistic equation does not grow without bound. Instead, it approaches the finite limiting population M as $t\to \infty $ . The population $P\left(t\right)$ steadily increases and approaches M from below if $0<{P}_{0}<M$ , but steadily decreases and approaches M from above if ${P}_{0}>M$ . Sometimes M is called the carrying capacity of the environment, considering it to be the maximum population that the environment can support on a long-term basis.

The five census years obtained from a sample frame is shown in Table 1 above. However, we aimed at selecting 1969 population as initial population and fit a model through 1989 and 2009 populations from the table. These sample sizes will be used to estimate the population total in 2019 census using the proposed techniques.

Here, ${P}_{0}=10942705$ (Initial population)

At ${t}_{2}=1989$ and ${P}_{2}=21443636$ we have;

$\frac{10942705M}{10942705+\left(M-10942705\right){\text{e}}^{-20KM}}=21448774$ (1.29)

Table 1. Census Results.

Similarly,

At ${t}_{4}=2009$ and ${P}_{4}=38610097$ we have;

$\frac{10942705M}{10942705+\left(M-10942705\right){\text{e}}^{-40KM}}=38610097$ (1.30)

Solving Equations ((1.29) and (1.30)) simultaneously we have;

$\left\{KM=0.038506784,M=124433288.8\right\}$

$P\left(t\right)=\frac{1.361636772\times {10}^{15}}{10942705+113490583.8{\text{e}}^{-0.038506784t}}$ (1.31)

2.2. Asymptotic Properties

Theorem: Law of large numbers:

Let ${X}_{1},{X}_{2},\cdots ,{X}_{n}$ be iid random variables with common expectation $\mu =E\left({X}_{i}\right)$ . Define ${A}_{n}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{X}_{i}}$ . Then for any $\propto \text{\hspace{0.17em}}>0$ , we have ${P}_{r}\left[\left|{A}_{n}-\mu \right|\ge \text{\hspace{0.17em}}\propto \right]\to 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{as}\text{\hspace{0.17em}}n\to \infty $

Proof of Theorem:

Let $Var\left({X}_{i}\right)={\sigma}^{2}$ be the common variance of the random variables; we assume that ${\sigma}^{2}$ is finite. With this (relatively mild) assumption, the Law of Large Numbers (LLN) is an immediate consequence of Chebyshev’s inequality.

For as we have seen above, $E\left({A}_{n}\right)=\mu $ and $Var\left({A}_{n}\right)=\frac{{\sigma}^{2}}{n}$ , so by Chebyshev we have

${P}_{r}\left[\left|{A}_{n}-\mu \right|\ge \text{\hspace{0.17em}}\propto \right]\le \frac{Var\left({A}_{n}\right)}{{\propto}^{2}}=\frac{{\sigma}^{2}}{n{\alpha}^{2}}\to 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{as}\text{\hspace{0.17em}}n\to \infty $

Table 2 above represents the census population from 1969 to 2009 in the eight provinces in Kenya. Successive sample sizes are selected below to show the law of large numbers.

Here, $N=40$ and $\mu =2875251$

Table 2. Provinces.

Sample 1: Nairobi (1969 to 2009)

$n=5$ and ${\stackrel{\xaf}{x}}_{1}=1588651$

Sample 2: Nairobi and Central

$n=10$ and ${\stackrel{\xaf}{x}}_{2}=2318934$

Sample 3: Nairobi, Central and Coast

$n=15$ and ${\stackrel{\xaf}{x}}_{3}=1915957$

Sample 4: Nairobi, Central, Coast & Eastern

$n=20$ and ${\stackrel{\xaf}{x}}_{4}=2371755$

Sample 5: Nairobi, Central, Coast, Eastern and N/Eastern

$n=25$ and ${\stackrel{\xaf}{x}}_{5}=2067957$

Sample 6: Nairobi, Central, Coast, Eastern, N/Eastern and Nyanza

$n=30$ and ${\stackrel{\xaf}{x}}_{6}=2326900$

Sample 7: sample 6 and R. Valley

$n=35$ and ${\stackrel{\xaf}{x}}_{7}=2778090$

Remark: We can clearly see the sample mean tending to the population mean as we approach the population total N which is in line with the Law of Large Numbers (LLN)

Therefore, ${\mathrm{lim}}_{n\to N}{\stackrel{\xaf}{x}}_{n}=\mu $

Comment: This technique can track reasonably well throughout up to a sufficiently large number after which, there is a need to shift the initial condition to where the error margin starts increasing in order to maintain precision.

3. Main Results

Empirical Analysis

Table 3 represents the actual population totals, estimated population totals and their corresponding errors from 1969 to 2009.

4. Conclusion

In this work, the logistic model is very effective especially with the presence of outliers in trying to maintain precision. It can perform well with a sufficiently large sample size. The logistic model can be more efficient in prediction especially where a regression model is ill conditioned.

Table 3. Estimated population and error calculations.

${P}_{2019}\left(50\right)=49527365$

Acknowledgements

We are grateful to God for the grace and mercy rendered to us in seeing us through this work. Special thanks go to the African union for making it possible to pursue this course through scholarship.

Disclosure of Potential Conflicts of Interest

Authors strongly disclose no conflict of interest with regard to the publication of the paper.

References

[1] Edwards, C.H. and Penney, D.E. (2008) Differential Equations: Computing and Modeling. 4th Edition, 79-92.

[2] Hartley, H. and Ross, A. (1954) Unbiased Ratio Estimates. Nature, 174, 270-271.

https://doi.org/10.1038/174270a0

[3] Al-Jararha, J. (2012) Unbiased Ratio Estimation for Finite Populations. LAMBERT Academic Publishing, Germany.

[4] Al-Jararha, J. and Al-Haj, E.M. (2012) A Ratio Estimator Under General Sampling Design. Austrian Journal of Statistics, 41, 105-115.

https://doi.org/10.17713/ajs.v41i2.178

[5] Horvitz, D. and Thompson, D. (1952) A Generalization of Sampling without Replacement from a Finite Universe. Journal of the American Statistical Association, 47, 663-685.

https://doi.org/10.1080/01621459.1952.10483446

[6] Olkin, I. (1958) Multivariate Ratio Estimation for the Finite Populations. Biometrika, 45, 154-165.

https://doi.org/10.1093/biomet/45.1-2.154

[7] Singh, D. and Chaudhary, F. (1986) Theory and Analysis of Sample Survey Design. New Age Publication, New Delhi.

[8] Abu-Dayyeh, W., Ahmad, M., Ahmad, R. and Hassen, A. (2003) Some Estimators of a Finite Population Mean Using Auxiliary Information. Applied Mathematics and Computations, 139, 287-298.

https://doi.org/10.1016/S0096-3003(02)00180-7

[9] Kadilar, C. and Cingi, H. (2004) Estimator of a Population Mean Using Two Auxiliary Variables in Simple Random Sampling. International Mathematical Journal, 5, 357-367.