Bayesian Item Response Analysis of Method-of-Payment Habits in Banking Surveys

Show more

1. Introduction

In modern society, customers can choose various methods of payments in their day-to-day financial activities. Payments by phone or online through financial institutions are easy and convenient. At financial institutions or stores, one can also use a debit card, credit card, cash, cheque, bank draft or money order. These methods come with their own advantages/disadvantages and complexities, so customer response is greatly varied. In this paper, we consider the Bayesian analysis of method of payment habits using Item Response Theory. In psychometrics, item response theory (IRT) is also known as latent trait theory. IRT is widely used in education, psychology and marketing research surveys [1] . The IRT was initiated by three pioneers in three different fields; mathematician Georg Rasch [2] , psychometrician Frederic M. Lord [3] and sociologist Paul Lazarsfeld [4] . IRT came to more attention in late 1970 as personal computers gave researchers access to the computing power necessary for applying IRT on complex data.

A wide variety of item response models have been studied in the IRT literature. In this paper, the Rasch model [2] (which is the one-parameter model), the two-parameter model and the three-parameter model [5] are considered within the Bayesian framework. Although a complete review of the literature on these models strikes a difficult task, some of the prominent approaches to the statistical analysis of binary item response data are highlighted in Section 2. One can find a complete overview of these methods and models in [1] [6] .

This article is organized as follows. The Rasch model, two-parameter model and three-parameter model are reviewed in Section 2. Prior distributions are then defined on the model parameters. Inference is based on the posterior distribution which is the conditional probability distribution of the parameters given the observed data. Posterior summary statistics are obtained via MCMC methods using R software. In Section 3, we discuss the model selection methods and assess the prediction ability of the models. More specifically, a model assessment criterion is introduced for comparing the observed data against prior predictive output. The uniqueness of the approach is that we advocate a comparison of “features’’ that are of direct interest by introducing a similarity measure. This is an intuitive and simple approach which is not part of current statistical practice. In Section 4, we analyze a dataset arising from a survey conducted for the Federal Reserve Bank of Boston. We then highlight the suitability of the assessment criterion using graphical summary measures. We conclude with a discussion of the approach in Section 5.

2. Binary Item Response Models

Let
${Y}_{ij}$ be the selection of the j^{th} method of payment by the i^{th} customer in a customer satisfaction survey.
${Y}_{ij}=1$ indicates that the i^{th} customer prefers the j^{th} method of payment and
${Y}_{ij}=0$ indicates that the i^{th} customer is resistant towards the use of the j^{th} payment method. The Rasch model [2] is the simplest model used in the IRT literature for analyzing these types of data. It can be written as a one-parameter logistic response model. In this model, the probability that the i^{th} customer prefers the j^{th} payment method is given by

$P\left({Y}_{ij}=1|{\theta}_{i},{b}_{j}\right)=\frac{\mathrm{exp}\left({\theta}_{i}-{b}_{j}\right)}{1+\mathrm{exp}\left({\theta}_{i}-{b}_{j}\right)}={\left(1+\mathrm{exp}\left({b}_{j}-{\theta}_{i}\right)\right)}^{-1}$ (1)

where
$i=1,\cdots ,n$ denotes the customers and
$j=1,\cdots ,m$ denotes the payment methods. In (1),
${\theta}_{i}$ represents the payment habit attitude of the i^{th} customer and
${b}_{j}$ represents the complexity of the j^{th} payment method. We can now use the Item Characteristic Curve (ICC) to describe the relationship between the payment habit attitude and the probability of selecting a payment method. Note that
${\theta}_{i}$ is latent and the ICC links the latent
${\theta}_{i}$ to the probability that a randomly drawn customer with a given attitude will choose the j^{th} payment method. The customers with negative attitudes have less of a chance, while the customers with high attitudes are much more likely to select the j^{th} payment method. Note that this probability distribution is a member of the exponential family, so the analysts will enjoy all of the statistical features that accompany the family of exponential models. For example, the model allows for algebraic separation of the customer’s attitude parameters and payment method parameters. In frequentist methods, the parameters can be estimated using the conditional maximum likelihood approach. In the ICC of the Rasch model, an increase in attitude levels leads to the same increase in the probability of selecting a payment method since the ICCs are parallel to each other. This means that all payment methods are assumed to discriminate between customers in the same way, so a customer’s choice only differs in the payment method’s complexity, not in the customer’s attitude.

The two-parameter logistic model adds a discrimination parameter to overcome this limitation. In the two-parameter model, the probability that the i^{th} customer prefers the j^{th} payment method is given by

$P\left({Y}_{ij}=1|{\theta}_{i},{a}_{j},{b}_{j}\right)=\frac{\mathrm{exp}\left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)}{1+\mathrm{exp}\left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)}={\left[1+\mathrm{exp}\left(-{a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)\right]}^{-1}\mathrm{.}$ (2)

By adding a slope parameter ${a}_{j}$ to the model, the items are not equally related to the customer attitudes parameter. The higher the discrimination parameter, the better the item is able to differentiate between low and high attitude levels. The parameter ${a}_{j}$ captures how quickly the likelihood of ${Y}_{ij}=1$ changes with respect to the customer’s attitude. A certain discrimination value is only useful in certain regions of the attitude scale. Since a conditional maximum likelihood approach is not possible in this case, Bock and Aitkin [7] developed an estimation procedure based on marginal maximum likelihood. When ${a}_{j}=1$ , this model simplifies to the Rasch model in (1). Note that there is a probit version of the two-parameter model known as the normal ogive model [3] :

$P\left({Y}_{ij}=\mathrm{1|}{\theta}_{i}\mathrm{,}{a}_{j}\mathrm{,}{b}_{j}\right)=\Phi \left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)={\displaystyle {\int}_{-\infty}^{{a}_{j}\left({\theta}_{i}-{b}_{j}\right)}}\text{\hspace{0.05em}}\varphi \left(z\right)\text{d}z$ (3)

where $\Phi \left(\mathrm{.}\right)$ represents the cumulative normal distribution function and $\varphi \left(\mathrm{.}\right)$ is the normal density function.

Note that the models in (1) and (2) do not consider possible random choices of novice customers without any prior payment habits. This might occur when a customer starts to use payment methods for the first time without having any prior payment habits. The three-parameter model introduces an extra parameter ${c}_{j}$ to account for this type of behaviour as given by

$P\left({Y}_{ij}=\mathrm{1|}{\theta}_{i}\mathrm{,}{a}_{j}\mathrm{,}{b}_{j}\mathrm{,}{c}_{j}\right)={c}_{j}+\left(1-{c}_{j}\right)\frac{\mathrm{exp}\left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)}{1+\mathrm{exp}\left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)}\mathrm{.}$ (4)

In (4), the probability of ${Y}_{ij}=1$ is given by an unknown guessing probability plus a second term representing the dependency on payment method complexity and the customer’s attitude level. When ${c}_{j}$ = 0, the model simplifies to the two-parameter model in (1). The three-parameter normal ogive model [5] is in the form of

$P\left({Y}_{ij}=1|{\theta}_{i},{a}_{j},{b}_{j},{c}_{j}\right)={c}_{j}+\left(1-{c}_{j}\right)\Phi \left({a}_{j}\left({\theta}_{i}-{b}_{j}\right)\right)\mathrm{.}$ (5)

Note that in frequentist parameter estimation, it is assumed that parameters are unknown but fixed. In banking customer surveys, payment method complexities, customer attitudes and discrimination effects may be, rather, random quantities. This is because banks frequently make changes and adjustments to their payment methods and customers adjust their attitudes accordingly. The Bayesian paradigm allows us to treat these parameters as random quantities which is appealing in item response data coming from customer surveys. In the Bayesian settings, these random parameters arise from prior distributions that reflect the uncertainty about the true values of the parameters before conducting the surveys. This prior knowledge can typically arise from prior survey findings.

We now describe the Bayesian formulation of the models in (1), (2), and (4). The likelihood of the observed data $\underset{\_}{y}$ is

$L\left(\underset{\_}{y}|\underset{\_}{P}\right)={\displaystyle \underset{i=1}{\overset{n}{\prod}}}{\displaystyle \underset{j=1}{\overset{m}{\prod}}}{\left({P}_{ij}\right)}^{{y}_{ij}}{\left(1-{P}_{ij}\right)}^{1-{y}_{ij}}$ (6)

where ${P}_{ij}=P\left({Y}_{ij}=1\right)$ is given by (1) in the Rasch model, (2) in the two-parameter model and (4) in the three-parameter model. Note that in our data set $\underset{\_}{y}$ is an $n\times m$ matrix of 0’s and 1’s. We remark that customers must be informative and different payment methods may contain information in different capacities. We can quantify this payment method specific information using the Fisher Information of the data. Conditioning on the item parameters ${a}_{j}\mathrm{,}{b}_{j}$ and ${c}_{j}$ , it is easy to show that the Fisher information for estimating ${\theta}_{i}$ is

$I\left({\theta}_{i}\right)={\displaystyle \underset{j=1}{\overset{m}{\sum}}}\frac{{\left({{P}^{\prime}}_{ij}\left({\theta}_{i}\right)\right)}^{2}}{{P}_{ij}\left({\theta}_{i}\right)\left(1-{P}_{ij}\left({\theta}_{i}\right)\right)}.$ (7)

The Fisher information for the three-parameter model is

$I\left({\theta}_{i}\right)={\displaystyle \underset{j=1}{\overset{m}{\sum}}}\text{\hspace{0.05em}}{a}_{j}^{2}\frac{\left(1-{P}_{ij}\left({\theta}_{i}\right)\right){\left({P}_{ij}\left({\theta}_{i}\right)-{c}_{j}\right)}^{2}}{{P}_{ij}\left({\theta}_{i}\right){\left(1-{c}_{j}\right)}^{2}}.$ (8)

The Fisher information for the two-parameter model and one-parameter model can be obtained from (8) by substituting ${c}_{j}=0$ and ${a}_{j}=1$ , respectively.

We assign the following prior distributions to the primary parameters of interest:

$\begin{array}{l}\mathrm{log}\left({a}_{k}\right)\sim \text{N}\left({\mu}_{a}\mathrm{,}{\sigma}_{a}^{2}\right)\\ {b}_{k}\sim \text{N}\left({\mu}_{b}\mathrm{,}{\sigma}_{b}^{2}\right)\\ {c}_{k}\sim \text{Unif}\left(\mathrm{0,1}\right)\\ {\theta}_{i}\sim \text{N}\left({\mu}_{\theta}\mathrm{,}{\sigma}_{\theta}^{2}\right)\end{array}$ (9)

The customers are assumed to be sampled independently from a population, and a normal prior density is specified for the attitude parameters with mean ${\mu}_{a}$ and variance ${\sigma}_{a}^{2}$ a priori. A common normal prior is assumed for the discrimination and complexity parameters. The discrimination parameter is restricted to be positive with mean one which indicates a moderate level of discrimination. We do not want complexity parameters to be characterized extremely simple or complex, so we set the mean as zero indicating an average level of complexity. Both variance parameters ${\sigma}_{a}^{2}$ and ${\sigma}_{b}^{2}$ are fixed to be one. Note that one can assign suitable hyper-priors, such as an inverse Gamma for ${\sigma}_{a}^{2}$ and ${\sigma}_{b}^{2}$ . The guessing parameter arises from a $U\left(\mathrm{0,1}\right)$ . These priors reflect the state of our knowledge about the parameters before we look at the data. The likelihood tells us how likely it is to observe the current data if the parameters of interest have their current values. The posterior reflects the state of our knowledge about the parameters after we have observed the data.

Note that our likelihood contains multidimensional parameters which lead to a high-dimensional posterior distribution. In the case of complex posteriors, simulation procedures are often used to sample variates from the posterior. The most widely used sampling method is Markov chain Monte Carlo (MCMC). In MCMC, a Markov chain is constructed which has the posterior as its stationary distribution. We implement MCMC for the models described in this paper using R software.

3. Prediction Abilities of the Models

In complex models, it is important that the proper model is selected for the data. In principle, the Bayesian approach to model selection is straightforward and the diagnostics such as AIC [8] , BIC [9] and DIC [10] have been proposed and are often used for this purpose. However, the practical implementation of this approach often requires careful investigation as there is potential inconsistency in model selection depending on which of the diagnostics was used. In IRT literature, inconsistencies and inaccuracies have been found among model selection methods under various simulated conditions [11] . In our application, it is highly important that the model selected from competent models is capable of predicting realistic results as it helps to identify future directions in terms of customer behaviour on use of payment methods. For this purpose, one can generate datasets from the posterior predictive distribution [12] of the model and compare it against the observed data using appropriate features. The posterior predictive density is defined as

$f\left(y|x\right)={\displaystyle \int}f\left(y|\theta \right)\pi \left(\theta |x\right)\text{d}\theta $ (10)

where x is the observed data, $f\left(y|\theta \right)$ is the sampling density and $\pi \left(\theta |x\right)$ is the posterior density. Model assessment then involves a comparison of the future values y versus the observed data x. A major difficulty with the posterior predictive method in (10) concerns double use of the data [12] . Specifically, the observed data x is used both to fit the model giving rise to the posterior density $\pi \left(\theta |x\right)$ and then is used in the comparison of y versus x. For this reason, one can sample “model variates’’ y from the prior predictive density

$f\left(y\right)={\displaystyle \int}f\left(y|\theta \right)\pi \left(\theta \right)\text{d}\theta $ (11)

where $\pi \left(\theta \right)$ is a proper prior density. It is then a matter of deciding how to compare the y’s against the observed data matrix x. For this purpose, we define a measure called the “Similarity Measure” (SM) as

$\text{SM}\left(\text{Observed},\text{Predicted}\right)=\text{SM}\left(x\mathrm{,}y\right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}}\left({x}_{i}^{T}{y}_{i}+{x}_{i}^{{c}^{T}}{y}_{i}^{c}\right)$ (12)

where x and y are
$n\times m$ matrices of observed and predicted data. The values
${x}_{i}^{c}$ and
${y}_{i}^{c}$ are the complements of the i^{th} row of x and y, respectively. A simple comparison of y’s against the observed data matrix x can be easily carried out through the calculation of a Similarity Measure. Note that
$SM\left(x\mathrm{,}y\right)$ in (12) is a combination of correct matches and incorrect matches which provides a meaningful summary measure. We illustrate the SM for the simulated and observed data in the next section.

4. Data Analysis

We consider a dataset arising from the 2010 Federal Reserve Bank of Boston Survey of Consumer Payment Choices (SCPC) to illustrate the models described in this paper. The SCPC is a household survey that aims to measure the banking and spending habits of Americans. Our analysis focuses on responses to the following five methods-of-payments that were made at least once in the past year; cash, mobile phone, money order, traveler’s check, and a non-bank online service (e.g. PayPal). Our analysis also focuses on responses to whether the individuals have access to the following four methods-of-payment; general purpose pre-paid cards, merchant specific pre-paid cards, contactless credit card, and contactless debit card.

The dataset consists of $n=1275$ customers responding to $m=9$ questions on the payment methods. Observations made on payment method frequencies reveals that payment by cash is the most popular among customers as it has the highest proportion of selection, 0.767. The payment method via a traveler’s check is the least used method by customers with a proportion of 0.039. The posterior estimates of the parameters under the three models are given in Table 1.

Figure 1 shows the ICC curves of the different payment methods under the three models. The plots indicate the heterogeneity among customers in selecting payment methods. Figure 2 shows the behaviour of the total Fisher information for estimating $\theta $ conditioning on the estimates of item parameters in the three models.

We now look at the prediction ability of the models using the similarity measure defined in (12). We generate 100 prior predictive datasets and combine them with the observed data matrix. We then calculate the similarity measures

among these 101 datasets. Note that there are $\left(\begin{array}{c}101\\ 2\end{array}\right)=5050$ similarity

measures in which 100 of these are between the observed dataset and the generated datasets. Figure 3 gives the histogram of these similarity measures highlighting the similarity measures between the observed dataset and the generated datasets in the red colour. Note that this plot is capable of displaying the variation among predictive outputs and the variation between the observed dataset and the predictive outputs as well. It is clear that the plot under the three-parameter model reports the highest similarity measures with a good mixing behaviour indicating that the three-parameter model has the highest prediction ability.

Figure 1. The ICC curves of different payment methods.

Figure 2. Total fisher information.

Figure 3. Histogram of similarity measures on 100 simulated datasets.

Table 1. The posterior estimates of payment method parameters in IRT models.

5. Discussion

We have presented a Bayesian analysis of item response data with an application to the method-of-payment habits in banking surveys. We also presented a mechanism for assessing the prediction ability of the models and advocate that model selection should be based on the prediction ability of the models. If the model is able to express the uncertainty in the data via model parameters, there is no need to assess the prediction abilities. However, it is difficult to determine the sampling and parameter spaces as the true model behind the observed data is unknown. Our approach is based on the prior predictive simulations which allow us to compare the prediction ability of the models relative to the observed data.

In our data analysis of the SCPC, we found that the three-parameter model possessed the best prediction ability based on our assessment criterion. It indicates that the payment method complexities, customer attitudes and discrimination effects are important in explaining the method-of-payment habits of banking customers. We have not considered possible covariate effects and longitudinal aspect of the data in this paper. This is a topic of future research as this is an ongoing survey.

Acknowledgements

The authors thank anonymous reviewers whose comments helped to improve the manuscript.

Funding

Muthukumarana has been partially supported by a Discovery grant from the Natural Sciences and Engineering Research Council of Canada.

References

[1] Fox, J.P. (2010) Bayesian Item Response Modeling: Theory and Applications. Springer, New York. https://doi.org/10.1007/978-1-4419-0742-4

[2] Rasch, G. (1960) Probabilistic Models for Some Intelligence Tests and Attainment Tests. Danish Institute for Educational Research, Copenhagen.

[3] Lord, F.M. and Novick, M.R. (1968) Statistical Theories of Mental Test Scores. Addison-Wesley, Reading.

[4] Lazarsfeld, P.F. and Henry, N.W. (1968) Latent Structure Analysis. Houghton Mifflin, Boston.

[5] Birnbaum, A. (1968) Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading.

[6] Hambleton, R.K., Swaminathan, H. and Rogers, H.J. (1991) Fundamentals of Item Response Theory. Sage Press, Newbury Park.

[7] Bock, R.D. and Aitkin, M. (1981) Marginal Maximum Likelihood Estimation of Item Parameters: Application of an EM Algorithm. Psychometrika, 46, 443-459.

https://doi.org/10.1007/BF02293801

[8] Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In: Petrov, B. and Csáki, F., Eds., Second International Symposium on Information Theory, Akadémiai Kiadó, Budapest, 267-281.

[9] Schwarz, G. (1978) Estimating the Dimension of a Model. The Annals of Statistics, 6, 461-464.

https://doi.org/10.1214/aos/1176344136

[10] Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002) Bayesian Measures of Model Complexity and Fit (with Discussion). Journal of the Royal Statistical Society Series B, 64, 583-639. https://doi.org/10.1111/1467-9868.00353

[11] Kang, T. and Cohen, A.S. (2007) IRT Model Selection Methods for Dichotomous Items. Applied Psychological Measurement, 31, 331-358. https://doi.org/10.1177/0146621606292213

[12] Gelman, A., Meng, X.L. and Stern, H.S. (1996) Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica, 6, 733-807.