In this article we propose a filtering technique that uses ego-network signals to estimate a hidden process. Consider a financial market with a single lender and borrowers, who are represented by nodes in a dynamic social network. For a particular borrower, let the process modeled as a mean reverting Ornstein-Ulehnbeck process capture her1 true credit quality. On account of the information asymmetry between the borrower and the lender, the lender is unable to directly observe . However through interactions with the borrower, the lender gets to observe a continuous time process , modeled as a linear diffusion process. drives the drift of . This is a continuous time linear state space model with being the state process and the observation process. Kalman-Bucy filtering can thus be used to obtain the “optimal” estimate of in the mean square sense.
We assume that network ties are based on homophily. Homophily  , is the idea that individuals with similar characteristics are likely to be friends than individuals with different characteristics. Thus social network ties are based on closeness in credit type: the probability that two individuals will create/maintain a network tie between them is proportional to the distance between their credit types. The probability of a network tie formation/termination is conditional on the parties meeting. The meeting process is modeled as a random event whose probability is a deterministic function of the population size of the network, large population leading to sparse networks. Individuals know their credit type and can also observe the credit type of their direct friends (alters) in the network. The social network is thus modeled as a dynamic latent space network. The lender’s view of the network is restricted to ego-network signals of borrowers at fixed discrete times. At times , the lender observes the particular borrower’s ego network and receives unbiased signals related to her credit quality and the credit quality of her alters. Thus at the information times , the lender gets to observe the vector constituting the unbiased signals of the credit quality of the borrower and her alters. The dimension of the vector is a function of the actual degree (number of alters) of the borrower at the time .
Our model proposes the inclusion of the ego-network signals into the filtering of the process . The lender’s observation filtration is augmented by the filtration generated by at discrete time points. In the proposed model, Bayesian updating at times is used to incorporate the information from the ego-network signals into the estimation of . We note that by the Gaussian nature of the processes and and the formulation of the ego-network likelihood, the updated estimate of at times remains Gaussian and we derive explicit results for its mean and variance. We also derive results showing that the inclusion of the signals leads to lower conditional variance for the filtered process. By introducing the meeting probability in network tie formation, our model is an extension of  , where the conditional expected number of friends was treated as a constant. Further, we study the asymptotic behavior of the conditional variance as the frequency of network information arrivals . Increasing the frequency of network information arrival times leads to clearer signals, and in the limit as , we get to the full information scenario.
There exists several studies on the statistical modeling of social network. Some of the models proposed in these studies include the coevolution model of  whereby the authors proposed a continuous time network model. The nodal attributes modeled as Markov chains influence the formation of network ties, which in turn influence the transition probabilities of the nodal attributes. In  , the authors proposed a static latent space network model, where the nodal attributes are in a low dimensional Euclidean space, and these attributes influence the formation of network ties. The static model has been extended severally to the time varying case by among others,  who proposed a directed dynamic latent space model. For a review of the recent studies on latent space network models, see e.g.  .
Existing studies on the mathematical modeling of consumer credit risk include  where the authors proposed a continuous time model of a borrower’s credit type. By modeling the credit type as a jump diffusion process and applying the Option pricing theory, the authors were able to derive explicit formulation of the borrower’s default probability. Consumer credit risk modelling is mainly focused on credit scoring, the use of statistical models to aid in credit granting decisions. Common techniques used for credit scoring include linear discriminant analysis, logistic regression, bayesian classifiers, random forest and finite Markov chains, see e.g.  for a review. In recent times, Hidden Markov models (HMM) have been applied for credit scoring e.g.  who compared the performance of HMM and logistic regression in the classification of customers and evaluation of the probability of default.  modeled the consumer’s credit rating as a discrete time Markov chain process upon incorporating a latent variable which captures the prevailing economic conditions. In  , the author proposed a credit scoring model whereby the borrower’s hidden credit type modeled as a discrete time Markov chain is learned through observing network related variables including reputation, trust and distrust. Proposing a static credit scoring model,  used ego-network signals to update the lender’s belief of the borrower’s unobserved credit type modeled as a Gaussian random variable. For a review of the application of social network data to consumer credit risk modeling , see  .
In  , the authors augmented the observation filtration with discrete time expert opinion to estimate the hidden Gaussian process driving the drift of the stock price. The model is an extension of the Black-Litterman model of  to the continuous time case. The authors in  estimated the unobserved drift parameter on an observation filtration initially enlarged with some anticipative information perturbed by independent noise. Text book treatment of stochastic filtering includes  ,  and  .
The article is organized as follows. In Section 2, the credit risk and dynamic network models are presented. Results on stochastic filtering are presented in Section 3. In Section 4, the properties of conditional variances are derived in details under the various information setups. Brief numerical results are presented in Section 5, whilst Section 6 concludes.
2. The Model Setup
Consider a filtered probability space with satisfying the usual conditions of right continuity and completeness. All processes are assumed to be adapted.
Borrower’s Behavioral Dynamics
The borrower’s hidden credit quality process is modeled as a mean-reverting Ornstein-Ulehnbeck process defined as
where are constants and is a Brownian motion. , measurable and is independent of B. Thus is a Gaussian process with the mean and variance given by
respectively. The hidden credit quality drives the drift of the borrower’s observed behavioral dynamics which is modeled as a diffusion process defined as
The parameters are assumed to be constants and is a adapted one dimensional Brownian motion. and are assumed to be independent.
Let be the population of a society, such that individuals are represented as nodes in a dynamic network. Each individual in the population is assumed to have an independent and time varying credit quality modeled as a Gaussian process. When a pair of individuals i and j get the opportunity to meet, they may decide to create, terminate or continue a network tie by mutual consent. Thus network tie formation and termination are conditioned on the
meeting probability . Modeling the meeting probability as a
function of population size captures network sparseness, which is a property observed in real life social networks. Thus the meeting probability reduces with increased number of individuals in the population. Assuming an undirected network i.e. , then we let
for every and with
Hence the network ties are independent Bernoulli random variables conditioned on the nodal attributes . The network tie formation probability is modeled as a probit link function. Conditional on the meeting process, existence of a network tie between individual i and j at time t is a function of the Euclidean distance between their respective credit types. The network model captures homophily, since shorter distance between credit types leads to higher probability of network tie formation. The model assumes zero cost incurred on network tie formation or termination.
Define as the graph of friendship ties in the society at time t. The set of borrower i’s direct friends (alters) at time t, known as her ego-network is defined as . For a particular borrower i, we consider her hidden credit quality process and observed behavioral score .
The lender observes in continuous time the process denoting the borrower’s behavior. Further, at discrete fixed times the lender observes the borrower’s ego network and receives signals from her and her alters. Let the vector denote the ego-network signals received by the lender at times , comprising the borrower’s own signal and the signals from her alters . The variables are i.i.d across individuals with for , . Thus the lender receives noisy but unbiased signals upon observing the borrower’s ego-network at time .
The information available to the lender can thus be represented by the following filtrations
corresponds to the continuous time behavioral information only, consists of the ego-network signals received at discrete times whilst is the combination of behavioral information and the ego-network signals. We assume that the σ-algebras and are augmented with the null sets. Note that for each , .
3. Stochastic Filtering
The focus of stochastic filtering is to estimate the hidden stochastic process based on observations up to time t. Let be the projection of the process onto the observed filtration i.e. . In this section, we derive explicit results for the filtering equations and the conditional variance of the hidden process .
When the lender’s observation σ-algebra is , i.e when the lender does not receive any ego-network signals, we are in the realm of the classical Kalman-Bucy filter, see e.g.  and  . This is since the state and observation equations constitute a linear Gaussian state space model. Let
and be the conditional mean and variance respectively in the σ-algebra .
The dynamics of is given by the following SDE
whilst the dynamics of is given by the deterministic ODE
Equation (8) is the well known Riccati equation, a deterministic equation whose unique solution is given as
given that the initial value is . In Equation (9), , and (see e.g.  ).
Behavioral Observations and Network Information
This is the case of most interest in the study. The lender’s observation σ-algebra is being the augmentation of with discrete time ego-network signals. Since the lender’s observation of the network is restricted to borrower i’s ego network, at each time , with no other additional borrower information, an individual’s credit quality is assumed to have the distribution . The lender uses the assumed density for all other individuals who are alters to borrower . The following lemma gives the expected degree (number of direct friends) conditional on the borrower’s true credit type.
At each time , conditional on the meeting process and the borrower’s credit quality , the expected number of friends is given as
Conditioned on the borrower’s true credit type and the meeting process, the probability of having a network tie with any other individual is
Thus the conditional expected number of friends is given by
For any and , let denote the precision of the ego-network signals at time . Further define the variable
Then it holds that:
1) For any , the filtered estimate is Gaussian with the dynamics
whilst the equation of the conditional variance is given as
with initial values and . is same as in Equation (9) whilst and
2) At information date , is Gaussian. The mean and variance are updated from their respective values before the arrival of ego-network signals to
1) Between two information dates, , there is no new arrival of ego-network signals. The lender’s σ-algebra is defined as . Thus we revert to the classical Kalman-Bucy filtering situation with the respective initial values for the conditional mean and variances given as and . For this case, the formulations for conditional mean and variance follow closely from Equations (7) and (8).
2) On the information arrival date , the lender receives ego-network signals and gets to update the conditional mean and variance of the filtered estimate. To incorporate the ego-network signals into the estimate, Bayesian updating is carried out since there is no time evolution from to t. At time , the conditional prior distribution of is Gaussian and the signals received are also Gaussian. The posterior probability of the borrower’s credit type is obtained by
The last equality is as a result of the assumption of independence for the . We have being the assumed density of any individual for . Thus the integrand is given by
The first term denotes the product of the conditional prior density (before the arrival of network information) and the likelihood function for the observation .
(a) denotes the assumed prior density for for times the probability that at time borrower i is friends with the individuals within her ego-network whose signals are in and that these friends have the signals as collected in
(b) denotes the probability that at time borrower i is not friends with anyone outside .
As , , then by the monotone convergence theorem and applying lemma 1 we have
whereby the integrand is a product of Gaussian densities. Upon integrating out , and matching the terms of and we obtain the posterior distribution as Gaussian with the given expectation and variance. The modeling of the network tie probability as a probit link function enables the elegant formulation of the posterior probability as a Gaussian.
Lastly, we consider a unique case, whereby the lender does not observe the continuous time information but only receives the discrete time ego-network signals i.e. when . Thus between network information times , the lender receives no information. We thus have the following corollary.
When the lender’s information set is we have
For between information arrival times, the respective conditional mean and variance are given by the equations.
At information date , it holds that is Gaussian with mean and variance
For , the proof can be found in corollary 4 of  . At information date , the conditional prior distribution at time is updated using the ego-network signals received from the vector . Following in a similar fashion to proposition 1 (part (2)), the conditional prior density is updated to a Gaussian posterior density with the given mean and variance.
4. Properties of the Conditional Variance
We study the properties of the conditional variance under the various information settings discussed in Section 3. We show that inclusion of ego-network signals leads to better estimates of the hidden process . A key result within this section is proposition 2 where we show that increasing the frequency of network information arrival times leads to the full information case in the limit as The following lemma shows that the ego-network signals improves the lender’s estimate of the credit quality.
For , and
The proof similar to proposition 6 of  , where a detailed proof is available.
The following proposition shows that as we increase the frequency of arrivals of ego-network information i.e. as then the variances tends to zero. It is an adaptation of the asymptotic result of  .
Let be a refining sequence of partitions of the interval
such that information dates are retained i.e. for then . Let be the mesh size. Further, let be a sequence of corresponding variances at information times . Assume that there exists a constant such that for all and all . Then it holds that for all , the conditional variances and tend to 0 as and .
From lemma 2, since we need only prove the assertion for . Further, we can assume ego-network information with constant variances i.e. for all . This assumption generalizes the proof even for the case where . For ease of notation we write instead of noting the dependence on N. For any and any we know that is given by
Since it follows that
We iterate this inequality for all and denote . This yields for and
Let , and . We desire to show that for a suitably chosen N. Define to be the index for which . Suppose that for all there exists a such that
Then we have that
where the minimum is over all . This yields (with one iteration less)
Note that in this case and as . Thus for all , . Thus we can choose such that which is a contradiction of the assumption in Equation (29).
Thus there exists a such that for all there exists an index set with . For each such N we choose as the last information arrival time before such that . In the case that , then for a suitably large N from Equation (27) implies that .
For the case when for we have that and . We can choose a suitable such that . An iteration similar to Equation (28) starting from with initial value yields that
for all as desired.
5. Numerical Results
In this section we provide a brief illustration of our findings on the properties of the conditional variance. We assume that the ego-network signals arrive at equidistant time points . We simulate the processes and using the parameter values in Table 1 with .
To illustrate the impact of the number of alters on the borrower’s conditional variances and , Figure 1 plots a comparison between the variances of and with and with no friends. The left panel plots a comparison of the variances for the case when the number of friends is constant at zero and five respectively. In the right panel, the number of friends is modeled as a Poisson random variable with parameter . In both plots, the conditional variances for the case when there exists friends’ data in is lower as compared to the case with zero friends. Besides, the perturbations in the conditional variance with randomly varying number of friends are well depicted in panel 2 of the plot (see Figure 1).
Figure 1. Left: plot of variances when no. of friends and Right: plot of variances when no. of friends and .
Table 1. Model parameter values.
In this article, we have presented stochastic filtering results whereby the hidden credit quality process, modeled as an Ornstein-Ulehnbeck equation drives the drift process of the borrower’s observed behavior score. We have formulated a latent space network model such that the ego-network signals received at discrete fixed times are incorporated into the credit quality filtering by way of Bayesian updating. Modeling of network tie probability using the probit link function enabled the elegant formulation of the conditional posterior density of the hidden process. We have presented explicit results for the conditional mean and conditional variance under the various information setups. Further, we have presented asymptotic properties of the conditional variance when the frequency of the network information arrival times is increased.
The results in this article thus present a theoretical justification of including ego-network data in credit scoring, for a network model based on credit type homophily. Future studies may consider network models whereby the network ties capture the strength or frequency of interaction between the nodes, instead of binary network ties.
The authors wish to thank African Union and Pan African University Institute of Basic Sciences Technology and Innovation, Kenya, for their financial support for this research.
1In this article we refer to the borrower as female and the lender as male.