For two independent proportions and , their difference is frequently encountered in the frequentist statistical literature, where tests, or confidence intervals, for are well accepted notions in theory and in practice, although most frequently, the case under study is the equality, or inequality of these proportions. For the Bayesian approach, Pham-Gia and Turkkan (  and  ) have considered the case of independent, and dependent proportions for inferences, and also in the context of sample size determination  .
But testing is only a special case of testing , with being a positive constant value, which is much less frequently dealt with. In Section 2 we recall the unconditional approaches to testing based on the maximum likelihood estimators of the two proportions and normal approximations. A new exact approach not using normal approximation has been developed by our group and will be presented elsewhere. Fisher’s exact test is also recalled here, for comparison purpose. The Bayesian approach to testing the equality of two proportions and the computation of credible intervals are given in Section 3. The Bayesian approach using the general beta distributions is given in Section 4. All related problems are completely solved, thanks to some closed form formulas that we have established in earlier papers.
2. Testing the Equality of Two Proportions
2.1. Test Using Normal Approximation
As stated before, taking we have a test for equality between two proportions. Several well-known methods are presented in the literature. For example, the conditional test is usually called Fisher’s exact test, and is based on the hypergeometric distribution. It is used when the sample size is small. Pearson’s Chi-square test using Yates correction is usually used for intermediary sample size while Pearson’s Chi-square is used for large samples. Their appropriateness is discussed in D’Agostino et al.  . Normal approximation methods are based on formulas using estimated values of the mean and the variance of the two populations. For example, we have
, and the pooled version , both being
approximately under . Cressie  gives conditions under which is better than , in terms of power. Previously, Eberhardt and Fligner  studied the same problem for a bilateral test.
Numerical Example 1
To investigate its proportions of customers in two separate geographic areas of the country, a company picks a random sample of 25 shoppers in area A, in which 17 are found to be its customers. A similar random sample of 20 shoppers in area B gives 8 customers. We wish to test the hypothesis that against .
We have here the observed value of and of which lead, in both cases, to the rejection of at significance level 5% (the critical value is 1.64) for .
2.2. Fisher’s Exact Test
Under the number of successes coming from population 1 has the distribution. The argument is that, in the combined sample of size , with successes from population 1 out of the total number of successes , the number of x successes coming from population 1 is a hypergeometric variable.
To compute the significance of the observation we have to compute several tables corresponding to more extreme results than the observed table. It is known that the conditional test is less powerful than the unconditional one.
Numerical Example 2
We use the same data as in numerical example 1 to test vs i.e. the proportion of customers in area A is significantly higher than the one in area B. We have Table 1:
the observed data , and also cases more extreme, which means . The p-value of the test is hence
Although technically not significant at the 5% level, this result shows that the proportion of customers in area B can practically be considered as lower than the one in area A, in agreement with the frequentist test.
REMARK: The problem is often associated with a 2 ´ 2 table where there are three possibilities: constant column sums and row sums, one set constant the other variable and both variables. Other measures can then be introduced (e.g. Santner and Snell  ). A Bayesian approach has been carried out by several authors, e.g. Howard  and also Pham Gia and Turkkan  , who computed the credible intervals for several of these measures.
3. The Bayesian Approach
In the estimation of the difference of two proportions the Bayesian approach certainly plays an important role. Agresti and Coull  provide some interesting remarks on various approaches.
Again, let . Using the Bayesian approach will certainly encounter some serious computational difficulties if we do not have a closed form expression for the density of the difference of two independently beta distributed random variables. Such an expression has been obtained by the first author some time ago and is recalled below.
3.1. Bayesian Test on the Equality of Two Proportions
Let us recall first the following theorem:
Theorem 1: Let be two independent beta distributed random variables with parameters and , respectively. Then the difference has density defined on as follows:
is Appell’s first hypergeometric function, which is defined as
where . This infinite series is convergent for and , where, as shown by Euler, it can also be expressed as a convergent integral:
which converges for , . In fact, Pham-Gia and Turkkan  established the expression of the density of the difference using (3) directly and not the series. Hence, the infinite series (5) can be extended outside the two circles of convergence, by analytic continuation, where it is also denoted by .
Here, we denote the above density (1) by .
Proof: See Pham-Gia and Turkkan  .
The prior distribution of is hence , obtained from the two beta priors. Various approaches in Bayesian testing are given below.
Bayesian Testing Using a Significance Level
While frequentist statistics frequently does not test , for and limits itself to the case , Bayesian statistics can easily do it.
a) One-sided test:
Proposition 1: To perform the above test at the 0.05 significance level, using the two independent samples and , we compute , where and , . This expression of the posterior density of , obtained by the conjugacy of binomial sampling with the beta prior, will allow us to compute and compare it with the significance level .
For example, as in the frequentist example of Section 2.1, we consider , , , and use two non-informative beta priors, that is, .
We note first that , giving .
We obtain the prior and posterior distributions of and (Figure 1). We wish to test:
We have : has posterior probability , and we fail to reject at the 0.05% level. This means that data combined with our judgment is not enough to make us accept that the difference of these proportions exceeds 0.35. Naturally, different informative, or non-informative, priors can be considered for and separately, and the test can be carried out in the same way.
b) Point-null hypothesis:
The point null hypothesis to be tested at the significance level in Bayesian statistics has been a subject of study and discussion
Figure 1. (a) Prior and posterior of and (b) Prior and posterior of .
in the literature. Several difficulties still remain concerning this case, especially on the prior probability assigned to the value (see Berger  ). We use here Lindley’s compromise (Lee  ), which consists of computing the highest posterior density interval and accept or reject depending on whether belongs or not to that interval. Here, for the same example, if , using Pham-Gia and Turkkan’s algorithm  , the 95% hpd interval for is , which leads us to technically accept (see Figure 2), although the lower bound of the hpd interval can be considered as zero and we can practically reject .
We can see that the above conclusions on are consistent with each other.
3.2. Bayesian Testing Using the Bayes Factor
Bayesian hypothesis testing can also be carried out using the Bayes factor B, which would give the relative weight of the null hypothesis w.r.t. the alternative one, when data is taken into consideration. This factor is defined as the ratio of the posterior odds over the prior odds. With the above expression of the difference of two betas given by (1) we can now accurately compute the Bayes factor associated with the difference of two proportions. We consider two cases:
a) Simple hypothesis: . Then , which
corresponds to the value of the posterior density of at , divided by the value of posterior density of at . As an application, let us consider the following hypotheses (different from the previous numerical example): vs. , where we have uniform priors for both and , and where we consider the sampling results from Table 1. We obtain the posterior parameters . Using the density of the difference (1), we calculate the Bayes factor,
. This value indicates that the data slightly
favor over , which is a logical conclusion since .
Figure 2. Prior and posterior distributions of . The red dashed lines correspond to the bounds of the posterior 95%-hpd interval.
Table 1. Data on customers in area A and B.
b) Composite hypothesis: As an application, let us consider the hypotheses (4), that is, vs. .
In general, vs. , where . We have
and (or ) as posterior probabilities. Consequently, we define the posterior odds on against as . Similarly, we have the prior odds on against ,
which we define here as . The Bayes factor is . Again, we use the
Now, using (4), , we can determine the required prior
and posterior probabilities. For example, gives
. In the same way, we obtain , using the prior . Since and , we have and . Finally, the Bayes factor is , which is a mild argument in favor of .
4. Prior and Posterior Densities of
The testing above can be seen to be quite straightforward, and is limited to some numerical values of the function that can be numerically computed. But to make an in-depth study of the Bayesian approach to the difference , we need to consider the analytic expressions of the prior and posterior distributions of this variable, which can be obtained only from the general beta distribution. Naturally, the related mathematical formulas become more complicated. But Pham-Gia and Turkkan  have also established the expression of the density of , where both have general beta distributions.
4.1. The Difference of Two General Betas
The general beta (or GB), defined on a finite interval, say (c, d), has a density:
and is denoted by . It reduces to the standard beta above when and . Conversely a standard beta can be transformed into a general beta by addition of, or/and, multiplication with a constant.
Theorem 2: Let and any two scalars , . Then
2) when . Otherwise, when .
1) We have
2) For ,
Pham-Gia and Turkkan  gave the expression of the density of , where and are independent general beta variables. The density of , which is only mentioned there, is explicitly given below.
Let and . For the difference defined on , there are two different cases to consider, depending on the relative values of and , since and do not have symmetrical roles.
Theorem 3: Let and be two independent general betas with their supports satisfying (6). Then has its density defined as follows:
where is Appell’s first hypergeometric function already discussed.
The argument uses first part 2) of Theorem 1 to obtain that . Then, it uses the exact expression of the density of the sum of two general betas (see Theorem 2 in the article of T. Pham-Gia & N. Turkkan  ).
We denote the above density given by (8), (9) and (10) by
Note: The corresponding case 2, when relation (7) is satisfied, is given in Appendix 1 (Theorem 3a).
To study the density of , a particular case that will be used in our study here is the difference between and , with being a positive constant.
In this case both Theorem 2 and Theorem 3 apply since and the middle definition section of disappears.
Theorem 4: Let and be two independent general beta distributed random variables. Then the density of , defined on , is:
and we denote this distribution by
This is a special case of Theorem 3.
An equivalent form using Theorem 4 leads to a slightly different expression, which gives however, the same numerical values for the density of (see Theorem 4a in Appendix 1).
4.2. Prior and Posterior Distributions of
Let be two independent beta distributed random variables, the first being a regular beta, , and the second being a general beta, .
Binomial sampling, with these two different beta priors, leads to the following
Proposition 3: The prior distribution of is , given by (11), and its posterior distribution is with and
is the difference of two random variables with respective distribution and , The prior distributions of is hence , as given by (14).
Binomial sampling affects these 2 distributions in different ways. For the first, the posterior is while the posterior distribution of the second is (see Proposition 3a in Appendix 2). Figure 3 shows the prior and the posterior of .
From Theorem 4, we obtain the expression of the posterior density of as follows:
Figure 4 shows the above density.
Figure 3. (a) Prior distribution of and (b) Posterior distribution of . The posterior of is hence given by Theorem 4, as .
The Bayesian approach to testing the difference of two independent proportions leads to interesting results which agree with frequentist results when non-informative priors are considered. Undoubtedly, all preceding results can be
Figure 4. Posterior density of .
generalized to other measures frequently used in a 2 ´ 2 table.
Research partially supported by NSERC grant 9249 (Canada). The authors wish to thank the Universite de Moncton Faculty of Graduate Studies and Research for the assistance provided while conducting this work.
Below is the expression of the density of when (7) is satisfied, instead of (6). This expression, with the one given in Theorem 3, covers all cases.
Theorem 3a: Let and be two independent general betas with their supports satisfying (10). Then has its density defined as follows: for
By rewriting , we can apply the above Theorem 2 and Theorem 3.
A parallel, and equivalent, result to Theorem 4 is given below:
Theorem 4a: The density of is:
and we denote .
Similar to the proof of Theorem 4.
Suppose that and has the prior distribution then the posterior distribution of is .
The prior distribution of is (see Theorem 2) with the pdf
The likelihood function is
Thus the marginal distribution of , the number of success, with , has density:
Therefore, the posterior distribution of given is
This is the p.d.f. of .
Q. E. D.