Communication industry has made the world a global village, and among all components of the industry, telecommunication is the most popular and most widely used  . It has created employment opportunities and empowered people economically and has removed distance, thereby saving lives and cost. Telecommunication has created opportunities for both the service providers and subscribers to do their separate but related businesses and earn their livings. But all these blessings do not come without some serious consequences of fraud in the business. Our interest in this paper is to detect fraud in the industry using the frequency and the duration of their calls. Fraud detection in telecommunication industry is vital to the survival of the industry. It is a common knowledge that fraudsters have flooded the telecommunication industries in various ways ranging from illegal access to bandwidth, attack on cyber securities, access to pocket of data, and illegal calls. All these constitute a huge loss to the telecommunication industries. These illegalities may force some of the service providers out of the industry if not properly checked. The multiplier effects of these fraudulent activities are massive loss of jobs, decline in the standard of living and its attendant consequences on those directly involved and others not directly involved. The most difficult aspect of these fraudsters is that they are smart and can hack into the data base of these service providers who should not sit back and watch them destroy their businesses. Since fraud is not localized, and does not have a permanent “office”, it can be committed at anywhere and at any time. Telecommunication operators store large amounts of data related to the activities of their subscribers. In these records, there exist both normal and fraudulent records. It is expected for the fraudulent activity records to be substantially smaller than the normal activity. If it were the other way around this type of business would be impractical due to the amount of revenue lost  .
This sector broadly has two types of users―domestic and commercial. There are cases where the connections are bought under domestic categories but the use is on a commercial scale. This causes substantial loss to the sector  . There is a need to adopt a data mining technique that will filter these fraudsters. Data volume has been growing at a tremendous pace due to advancements in information technology. At the same time there has been enormous development in data mining. Data mining can be defined as the process of extracting valuable information from data  . The telecommunication sector acquires huge amount of data due to rapidly renewable technologies, the increase in the number of subscribers and with value added services. Uncontrolled and very fast expansion of this field cause increasing losses depending on fraud and technical difficulties  .
Today, telecommunication market all over the world is facing a severe loss of revenue due to fraudsters  . To overcome such business hazards and to retain the market, operators are forced to look for alternative ways of using data mining techniques and statistical tools to identify the cause in advance and to take immediate actions in response. This can be possible if the past history of the subscribers were analyzed systematically. Fortunately, telecom industries generate and maintain a large volume of data such as Call detail data and Network data  . One reason for the non-utilization of this potential is the insufficient knowledge of the algorithms to be used on such data. Data mining tools and algorithms can be used to exploit the potential in the data when the data is synthesized efficiently. The advent of data mining algorithms and the development of software and hardware have led to an ease in analyzing huge and complex data  . Globally, the development of telecommunications industry is rapidly increasing with one innovation replacing another in a matter of years, months, and even weeks. Without doubt telecommunication is a key driver of any nation’s economy. Telecommunication is the communication of information by electronic means usually over some distance. It involves the transmission and receipt of information, messages, graphics, images, voice, video and data between or among telephones, internet, satellites and radio  .
In this area, some researchers have used different methods to determine both customer churn and fraud detection. Fraud detection and subscribers churn are related in the sense that both are concerned with subscriber’s behavior. Among the models used for data mining for both churn and fraud detection are naïve Bayes model; Gaussian probability distribution; Decision Tree algorithm; logistic regression and artificial neural network (ANN). Data mining is the extraction of vital information from the bulk of data available to the telecommunication industry and using an appropriate predictive model to classify and determine the behavior of subscribers. By refining the data and building an appropriate statistical model, so much hidden information about the subscribers and service providers will be unveiled, see     . This information is very vital to the survival of any service provider such as MTN, GLO, ETISALAT, MTEL, etc., in the business of telecommunication, especially in Nigeria. We shall use the subscribers’ frequency of calls and the duration of such calls as parameters of interest in this paper. Then, we shall determine the prior and posterior probabilities of the subscribers and their number of calls at a given time. We shall develop a linear discriminant function which will be used to classify the posterior probability distribution into fraud and genuine subscribers. In this paper, we are concerned with statistical modeling and not machine learning or artificial intelligence method of classification.
Because of the privacy agreement between the service providers and subscribers on one hand and to protect the service providers’ respective businesses on the other hand, the service providers hardly disclose their data. But nevertheless, simulation offers a close substitute for real life data. Hence, in this paper, we simulate data that depict the real life scenario and use it for the study. We simulate data on number of calls per unit time, and the call duration and our interest is on the domestic subscribers only. Eighty (80) sample data points were simulated for the study. The samples were categorized into four (4) with each having twenty (20) observations representing subscribers. The number of calls per subscriber over a period of time was also simulated and these represent real life data and are used for this study. The sample data generated from such process look like real life data drawn from a real system. We employed MINITAB 16.0 for the simulation of the data in this work. A sample of 20 observations each on the average number of calls and rate given as follows; 8 (t = 3), 5 (t = 4), 9 (t = 12), 6 (t = 7), were simulated for the study. The values such as 8, 5,∙∙∙, 7 outside the bracket represent the average number of calls per hour, and the values in bracket represent the average duration of the entire calls in minutes. Our interest is to develop a predictive data mining model for fraud detection in telecommunication industry. The simulated data were categorized into two sample multivariate data groups A and B. Most importantly, service providers determine their customers’ behaviour from the nature of their current calls and their past behaviour.
We need to know the history of these subscribers based on the information available to the network providers (service providers). This information is basically obtained from their call history. For this reason, the appropriate probability model that has a memory and can capture such a past history and relate it to the current history of subscribers’ is the Bayesian statistic model. However Bayesian statistics requires a prior probability. Some researchers make mistake of estimating the prior probability in this type of study using a continuous distribution as though the number of calls belong to a continuous random variable. Actually, the number of calls is a Poisson problem and therefore belongs to a discrete probability distribution. The value of Poisson random variables are the non-negative integers, and any random phenomenon for which a count is of interest can be modeled by assuming a Poisson distribution, provided that the random variables satisfies certain assumptions regarding the distribution  . Example of such a count includes the number of telephone calls per unit time coming into the switch board of a large business. Hence, we shall estimate the prior probabilities using Poisson distribution. Since each subscriber’s number of calls and time involved have non-stationery increment, we assume a non-homogenous Poisson process (NHPP) with parameter , where, is the call rate and t is the time duration for the calls. This has been tested and the shape parameter b was found to be greater than zero. The intensity function of power law process model ( ) can be used to describe the intensity of a NHPP. The power law process model has the mean and intensity function as
The parameters of the model are obtained by log linear transformation of the mean value function.
and a plot of ln against ln(t) will yield the value of ln(a) as the intercept and b as the slope of the linear graph. If the shape parameter b = 1, there is a stationary increment and we have HPP( ) but for b > 1, we have NHPP( t)  . Hence, the predictive probability model for the priors is:
 , where Pn(t) = the probability of n number of calls at a given time (t) and the other notations retain their usual meaning as defined before.
The following assumptions must be satisfied by the random variables before we can use Equation (3) above:
The stochastic process is called a non-homogeneous Poisson Process with rate function if
1) : (The number of events at time zero is equal to zero).
2) has independent increment: (The number of events in non-overlapping time interval are independent).
3) : (o(h)―some function of smaller order than h which satisfy the limit).
4) : (The probability that exactly one event will occur in a small interval of length t + h approximately equal to ).
5) : (The probability that no event occur in a small interval of length t + h).
6) : (The probability that more than one event will occur in a small interval of length t + h).
7) The events must occur at random  .
Bayesian statistics model is adapted for the posterior distribution since it has the attribute of capturing the prior behaviour of these subscribers to determine their current behaviour. Hence, the predictive statistical model for this study is
where = the conditional probability that the random variable assumes a specific value given that its prior probability was . Note that is now a random variable. = the joint probability distribution of the subscribers  .
Our interest is to classify the subscribers as either genuine or fraudulent. Hence, this is a classification problem and linear discriminant analysis will be employed to classify the subscribers where they belong. This classification will enable service providers to determine the measures to take against these fraudsters. The discriminant analysis will discriminate between the legitimate subscribers and fraudulent ones within the network. The idea of discriminant analysis is a search for the differences in two or more groups that consist of multivariate measurements. One (or more) linear function(s) which maximally differentiate(s) between these groups are constructed. These functions are then used to classify new member of similar group into the appropriate group they belong and differentiate them from the group they do not belong to  . The linear discriminant function employ is given in Equation (5).
where ; is the inverse of the dispersion (variance-covariance) matrix and is the difference in the mean vectors between the two multivariate samples and is the linear discriminant function. We established the optimal classifier of the discriminant function and finally classify the sample data accordingly based on their posterior probability distributions. Two multivariate sample data with two variates will be derived from Equation (4). The two sample multivariate data with two variates each are the posterior probabilities of each group. Then, we shall classify the samples as belonging to either genuine or fraudulent subscribers based on the optimal classifier ( ). Our classification rule will be: classify the subscribers in group A into “A1; A2”, where A1 is the fraudulent subscribers and A2 is the genuine subscribers. Similarly, we do the same for group B designated by “B1; B2”. Fraud subscribers tend to make use of the services much more than the genuine subscribers and should therefore have higher probability distributions.
Subsc = subscribers.
n-call = the number of calls per subscriber per hour.
t (min) = the time spent on the calls.
Prop.n (n/N) = fraction of the number of calls in relation to the total number of calls.
Pr.of Prio = the probability of priors.
joint prb = the joint probabilities.
Posterior = the posterior probabilities.
Churn = the defection of subscribers from one network to another.
The average number of calls and time spent in each call are presented in Table 1.
A plot of ln(t) against ln( ) is presented in Figure 1.
The implication of the shape parameter being 1 indicate that the intensity function has stationery increment, through the PLP transformation; hence, this distribution follows HPP(ω) and the prior probability distribution of Equation (3) becomes
Table 1. Average number of calls ( ) and average time spent (t) ( ).
Table 2. No. of calls (hr), Pn(t), Joint Prob. prior and posterior probabilities.
Figure 1. Graph of ln(t) against ln( ).
Table 2 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
The average number of calls and time spent in each call are presented in Table 3.
A plot of ln(t) against ln( ) is presented in Figure 2.
The prior probability distribution in Equation (3) becomes
Table 4 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
The average number of calls and time spent in each call are presented in Table 5.
A plot of ln(t) against ln( ) is presented in Figure 3.
The prior probability distribution in Equation (3) becomes
Table 6 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
Figure 2. Graph of ln(t) against ln( ).
Figure 3. Graph of ln(t) against ln( ).
The average number of calls and time spent in each call are presented in Table 7.
A plot of ln(t) against ln( ) is presented in Figure 4.
The prior probability distribution in Equation (3) becomes
Table 8 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
Table 9 presents the posterior probability distributions for the two multivariate groups A and B.
Table 3. n-call, t(min) ln(t) and ln( ) for .
Table 4. No. of calls (hr), Pn(t), Joint Prob. prior and posterior probabilities.
Table 5. n-call, t(min) ln(t) and ln( ) for .
Table 6. No. of calls (hr), Pn(t), Joint Prob. prior and posterior probabilities.
Table 7. n-call, t(min) ln(t) and ln( ) for .
Table 8. No. of calls (hr), Pn(t), Joint Prob. prior and posterior probabilities.
Table 9. Multivariate sample data. (A) Posterior prob. from group a; (B) Posterior prob. from group B.
Figure 4. Graph of ln(t) against ln( ).
The variance-covariance matrix with two variates is given as:
where n1 and n2 respectively stand for the first and second samples respectively  .
The variances are:
The co-variances are:
The observed sample multivariate data are the respective posterior probabilities of the four groups, the tendency is that their respective means will be equal thereby making the difference in the means vector to be zero. The reason behind this is that the sum of probabilities is one (1). But we can overcome this by observing the sample data carefully. Any of the sample point that cannot be approximated to two decimal places (2.d.p) with value is regarded as zero, and the sample size adjusted accordingly. Hence from sample A; delete serial number 19 from column 2; therefore, variate 1 has n = 20 and variate 2 has n = 19. Similarly, from sample B; delete serial numbers 3 and 13 from column 2; therefore, variate 1 has n = 20 and variate 2 has n = 18. The adjusted variance-covariance matrix for sample A are:
The adjusted variance-covariance matrices for sample B are:
The above is a symmetric matrix, where the diagonal elements are the variances. The upper and lower entries are the covariance. The values of the matrices are presented below.
From sample A and B, we have:
From sample B, we have:
The pooled sample dispersion matrix is
The dispersion matrix is
The inverse of the dispersion matrix is
The linear discriminant function is
The differences in the sample mean vector for sample A and B
Since the variates are the posterior probabilities which cannot be negative, the difference in the mean vector cannot be negative.
The optimal classifier for discrimination is
4. Classification Rule/Conclusion
Classify subscribers in group A whose posterior probability is 0.7368 and above into group A1 and those whose posterior probability falls below 0.7368 into group A2. Also classify subscribers in group B whose posterior probability is 0.7368 and above into group B1 and those whose posterior probability falls below 0.7368 into group B2.
The subscribers that belong to A1 and B1 are the fraudulent subscribers while those that belong to A2 and B2 are the legitimate subscribers. From the sample observations in Table 9(A) and Table 9(B), all the subscribers are legitimate because their posterior probabilities are less than the optimal classifier .