1. Introduction and Literature Review
The Internal Rating Based Approach allows banks to determine their capital requirements according to internal models for the risk parameters PD (=proba- bility of default), EAD (=exposure at default) and LGD (=loss given default). The underlying rules according to which this shall be done are contained in the Capital Requirements Regulation (CRR,  ) and in the corresponding Regulatory Technical Standards (RTS). CRR specifies no requirements with regard to a model choice, in principle all types of models are allowed. This is also true for the new IFRS 9 Standard that will be authoritative for the determination of credit impairments from January 2018 on. Accurate estimates for risk parameters are essential in both cases.
At the beginning of the century modeling of the risk parameter LGD was carried out in a rather simplified manner. Over the years banks have recognized its (PD equal) significance and advanced models have been developed. Basically the parameter can be separated into two categories: market LGD and workout LGD. A market LGD is usually calculated from market data, especially from data on defaulted bonds. The calculation of a workout LGD takes into account banks' internal support of defaulted customers and LGD is calculated using discounted cash flows over the whole workout period. In both cases modeling of accurate LGD estimators is ambitious for many reasons. One reason is the lack of data especially for low default portfolios. Another one is the general complexity in modeling LGD. In order to be able to predict losses accurately, banks must differentiate LGD values on the basis of a wide set of transaction characteristics. The most important characteristics are borrower types, collateral types, product types and default scenarios. Another difficulty arises from some interaction of these characteristics over time, which results in an extremely heterogeneous and multidimensional estimation problem. The interaction produces, however, some stylized facts of historically observed LGDs. The maybe most important stylized fact is the bimodal (under some circumstances also a multimodal) structure of the empirical LGD distribution, which is displayed in the next Figure 1.
The bimodal structure is a characteristic often observed in LGD data. The peaks at 0% and at 100% are generated for two main reasons: Firstly, for defaults that end with a cure event or are fully collateralized a loss realization of 0% (or nearly 0%) is the baseline case. On the other hand, banks also realize total losses from defaulted engagements quite often. Here, the most prominent explanations are for instance extremely unfavorable liquidations of collateral or long ongoing legal proceedings. Another explanation is a write off of the entire or a big proportion of outstanding exposure without starting the workout process. These facts explain the bimodal loss structure very well.
Figure 1. Bimodal shape of the empirical LGD distribution based on approximately 6.000 default observations (source: bank internal loss data).
When modeling LGDs two main approaches may be distinguished: parametric and non-parametric models. The non-parametric approach contains tree models, models based on neural networks (  ) and option theoretic models (  ,  and  ). Parametric LGD models are regression based. Besides OLS and logit regression new models have been developed recently: inflated beta regression (  ), generalized beta regression (  ), censored gamma regression (  ), zero- adjusted gamma regression (  ), and mixture-models (  and  ). In  the authors point out some problems that arise in LGD estimation and show how they may be solved. All these models have been developed to accurately take into account the special shape of the empirical LGD distribution. Moreover, meanwhile many empirical studies exist, which compare different LGD models:  ,  ,  ,  and  .
If banks use internal models for regulatory capital estimation, these models must be compliant with CRR  . Two important requirements are concerned with usage of historical data for model building (Article 179 (1) CRR) and validation that must be done at least annually (Article 185 b) CRR). In addition, validation is required to be done both qualitatively and quantitatively. The quantitative part of the validation process is about assessing the predictive power of the model (backtesting), its stability and discriminative power. Backtesting and stability assessment are usually done by splitting the defaulted portfolio into an in-time and an out-of-time sample. The assessment of discriminative power is more challenging. For PD models, the assessment is usually based on the Accuracy Ratio. This measure is derived from ROC- (Receiver Operating Characteristic) or CAP- (Cumulative Accuracy Profile) curves and is a common tool in the validation process (see  ,  and  ). For that reason an equivalent measure for LGD models is desirable. However, such a performance measure is not documented in the literature. A direct transcription of the concept seems not to be possible. A major reason for this is that, in contrary to PD, LGD is not digital but a continuous parameter that takes values in the interval [0,1].
In each of the above mentioned LGD studies the assessment of model quality relies on statistical criteria without properly taking into account the model's ability to discriminate between low and high LGD scores. These criteria are: mean absolute error (MAE), relative absolute error (RAE), mean squared error (MSE), root mean squared error (RMSE) and the coefficient of determination ( or the adjusted ). They are defined as follows
1In  the authors also use the correlation coefficient between realized and predicted LGDs as a performance criterion.
2For a piecewise constant c.d.f. the inverse function is not defined. In this case a general inverse can be defined as .
where is the sample size is the realized (observed) and is the predicted loss quota for an engagement i1. These measures are somewhat one-sided and biased as they are not able to account for concentrations, being obvious in the empirical LGD distribution. As a matter of fact they are limited in the assessment, whether a LGD model is able to distinguish between small and big losses or not.
The aim of this paper is to close this gap. We develop a performance measure that is equivalent to the Accuracy Ratio known from PD models. The derivation is based on Lorenz curves and Gini coefficients. As there is a direct relationship between Lorenz curves and CAP curves, the measure may be regarded as a CAP- based measure. The results presented in this paper will enable banks to quantify, how well a model is able to predict concentrations observed in historical data. This in turn will enrich the tools used for a model assessment and finally help banks to validate their internal models more accurately.
The remainder of the paper is structured as follows: Section 2 introduces the relevant concepts. Section 3 contains the main ideas of the paper. After defining the new measure, we state first properties and give some interpretations. Section 4 focuses on providing alternatives for its calculation. These alternatives are important from a practical perspective. Section 5 concludes.
2. Lorenz Curve and Gini Index
The concept of Lorenz curves is well established in macroeconomics. The theory is profound and the idea has central applications in quantifying the growth of an economy and income inequality. The literature covering the topic is rich (see for instance  ,  ,  or  ). Financial applications also exist (  and  ). As we want to use the concept in the context of LGD validation, it will be necessary to recall some theoretical basics.
As usual, the random variable is understood as a conditional quantity:
where is the default indicator. The variable can take discrete values or be continuous. For the moment we will assume that LGD is continuous, predicted by an arbitrary but fixed model. Let be the cumulative distribution function (c.d.f.). Since is continuous and monotonically increasing, we can define its inverse or quantile function2: For , is the unique number with . Then it is true that
is monotonically increasing
If is continuously differentiable, we call the derivative a density function, . Then The expected value of LGD equals
The expectation may also be determined using the quantile function :
Now, we can define the Lorenz curve for the random variable LGD.
Definition 2.1: Let We define the Lorenz curve in two steps:
1) Determine the p-quantile, i.e. solve the equation
An immediate consequence is the following Lemma.
Lemma 2.2: The Lorenz curve can be determined as
Proof: The first equation follows directly from To prove the second equation, we apply integration by parts with and . Using the definition we obtain
This completes the proof. ∎
From the above statements we deduce the following properties of :
Assuming an increasing ordering of LGDs (increasing ranking), the Lorenz curve quantifies, which proportion of total loss is assigned to the cumulative proportion of the population.
is contained in the unit square with and Moreover, for all
is monotonically increasing and convex. The first two derivatives of satisfy
Especially, the value measures the ratio between the median and the expectation.
Remark 2.3: We will call the graph of , i.e. the set of points
a Power curve.
Next, we need the notion of a Gini index (Gini coefficient). The index is defined in terms of :
Definition 2.4: The Gini index is defined by the following equation:
The definition has a clear geometric interpretation. It is twice the area between the bisection line and the Lorenz curve. The factor 2 is a scaling factor. It ensures that It is worth noting that for a uniformly distributed random variable If then and Thus
The next result relates Gini indexes of two linearly transformed random variables.
Lemma 2.5: Let be a random variable with c.d.f. and Gini index For and define the new random variable Then, the Gini index of is given by
Proof: We have and Therefore,
This proves the statement. ∎
From the case it follows that the index is invariant under positive scaling.
The next expression provides an important statistical interpretation of the Gini index.
Lemma 2.6: Let be a random variable with c.d.f. Then the Gini index admits the following representation:
The Gini index equals a scaled covariance of the underlying variable and its rank.
Proof: Applying integration by parts to the definition, it follows that
The transformation together with gives
it follows that Thus,
and the proof is completed. ∎
Remark 2.7: Since,
the Gini index can be expressed as
In many cases the explicit determination of or is tedious. However, closed-form expressions exist for some prominent distributions (e.g. the lognormal, Pareto or Weibull distribution). As a final example we want to state the expression for the Gini index for the beta distribution. The result is established in  . The beta distribution is interesting in this context, since it has been proposed recently for LGD modeling (  ,  ). Let be a beta distributed random variable with parameters The density and c.d.f. of are given by
where is the beta function. Then
3. The Power Ratio
In this section we are going to apply the ideas from the last section to define a new measure for LGD model performance. The measure may be seen as a counterpart of the Accuracy Ratio, well known from PD modeling. Hereby, we make use of the following principle: an estimation model is usually developed on the basis of historical data. The historical experience is a vital model component and has a significant input on its development and calibration. This is also true for LGD models, as risk drivers and correlations are identified from historical loss data. Therefore, known realized losses must serve as a benchmark for an estimation model. This principle is completely in line with the PD model building and validation process.
Let be the historical loss portfolio that is used for model building or validation. We assume that consists of defaulted borrowers/agreements3. At time of default each borrower has an exposure . Let denote the entire portfolio exposure, i.e. . At the end of the model building or validation process the bank is able to assign a realized ( ) and a predicted loss quota ( ) to each borrower . For that reason the model is completely characterized by the following vectors:
We define the new performance measure, which we call the Power Ratio (PR), as the ratio of predicted and realized Gini coefficients, which are associated with predicted and realized loss distributions, respectively:
assuming that i.e. Also an ascending ranking of the random variable LGD is assumed. In general, it holds true that We have if the model is able to pattern the structure of realized LGDs over the entire spectrum of observations. This will tell us, that the model is able to predict concentrations caused by risk drivers in an exact manner. For a model that fails to do this, a Power Ratio of (nearly) zero will be the result.
An equivalent expression for the Power Ratio that corresponds more accurately to PD estimation is
3As banks offer a wide range of products, it is clear that a borrower may have several contracts with a bank. For instance, a customer may possess a mortgage loan account in package with a current account and a credit card account. Depending on the structure of collateralization these products will realize different losses. Since a LGD model can be built on different segmentation levels (customer types or product types) we will use these two terms as synonyms.
Here, the quantity denotes the area under the Lorenz curve for estimations and realizations, respectively. Since historical LGD realizations must be used as a benchmark model, the Lorenz curve for realized LGDs will be termed “the optimal curve”. The notion of is also commonly used in the context of PD validation.
The Power Ratio as is defined above, allows a clear geometric interpretation: It is the ratio of two areas: , where is the area between the bisection line and the Lorenz curve of the model and is the area between the bisection line and the Lorenz curve of the benchmark model.
As both the numerator and the denominator in the defining equations depend on ordered LGD-levels (ascending ranking of LGDs), measures, how adequate the model discriminates high realized losses from low realized losses. A value of 1 is achieved, if predicted concentrations exactly cover realized concentrations. However, it must be mentioned that a value of 1 is impossible to achieve for a CRR compliant LGD model. This is due to several requirements for IRB-models. In accordance with Article 179 (1) a), Article 179 (1) f), and Article 181 (1) of the CRR  banks are required to incorporate margins of conservatism in their LGD estimates. These margins cover different issues: economic downturn scenarios, statistical uncertainty and/or data quality. The compliance of these requirements leads to a direct impact on model performance. This will be explained using the following argument: Let us assume that the defaulted portfolio is composed of 50% of cured borrowers. In case of cure a loss of zero is realized ( ). The other proportion of is assumed to be terminated agreements with a total loss realization ( ). By construction exhibits a bimodal structure that should be taken into account by a model. However, to meet the CRR requirements, for the cured proportion of a conservative (positive) LGD estimation must be valid ex ante. Let be the predicted LGD in case of cure. Then a simple calcu-
lation shows that
This rather simple example shows that regulatory requirements directly impact LGD model performance. This is true for all performance criteria. From practical experience models with values around 0.5 turn out to be sufficiently risk differentiating. Also, a value of nearly 1 may indicate an overfitting of the model.
From Lemma 2.6 we get the following equation for the Power Ratio:
where is the model mean, is the empirical mean and denotes the empirical distribution function. If a model is calibrated on the ex post level , which may be plausible for impairment purposes, then allows the interpretation as a ratio of two covariances.
Another theoretical aspect of is concerned with its sensitivity. Let be fixed. Recalling the expression for the Gini index
we see that the measure is robust to extreme values of the distribution. The function with attains the maximum value for , meaning that is most sensitive to changes near the median of the LGD distribution.
4. PR Calculation in Practice
On the next pages we will give guidance concerning PR calculation in the banking practice. From the previous analysis it is clear that PR can be calculated in many different ways. We will focus on the two most important alternatives:
default-weighted PR calculation.
exposure-weighted PR calculation.
The first alternative is crucial for banks that use IRB-models. In accordance with Article 181 (1) a) of CRR banks are required to use default-weighted LGD estimates. Hence, model validation should be compliant with the requirement. Exposure-weighted estimation is important for impairment and economic capital calculation, since it may help to identify risks that are driven by exposure concentrations.
Let be the defaulted portfolio consisting of borrowers. is characterized by the vectors: First, let us focus on LGD realizations. We point out that realized LGDs do not necessarily lie in the unit interval, i.e. with Unfavorable liquidations of collateral may lead to LGD values above 1. Also, negative LGD values are possible for specific portfolios. For instance, defaulted leasing contracts may lead to negative LGDs. In this case we have
Let the realized LGDs be ranked, i.e. with
Since different borrowers may realize equal losses it holds that If each LGD class (rank) is weighted equally, then the empirical distribution is given by
Obviously, since and are finite, is piecewise linear. For each line segment the slope of , , equals
In accordance with the findings of the last section, we may write this result as
where the mean LGD is computed as Observe that if a fraction of LGD realizations is negative, so is the Lorenz curve for small . Hence, may also be negative.
In the same manner we can construct the Lorenz curve for predicted LGDs. Let us assume that we have fixed a prediction model. This model will produce a LGD ranking of the form with . Hence, with ,
where the mean LGD is computed as Since LGD estimates will be non negative, so will be .
We also see that the following results are true:
Proposition 4.1: The following statements hold:
A LGD estimation model with a single rank possesses no discriminative power .
Let If LGD estimations are linear transformations of realized LGDs, then the discriminative power of the model equals
Proof: Part one is trivial. The second part follows from Lemma 2.5. ∎
Next, we show how unequal weighting of LGD classes can be integrated into PR calculations. The starting point is an ordered sequence of LGD realizations:
Let be the number of borrowers contained in a LGD class . Then
where the mean realized LGD is now computed as
with It is clear that the weights allow an interpretation as the probabilities of sorting a borrower into a LGD class .
Analogously, we get for
It is interesting to compare the two approaches, i.e. especially equations (12) with (18). They coincide if the underlying portfolios are either completely or sufficiently heterogeneous. In the first case, we have and for all . For a sufficiently heterogeneous portfolio we would expect that each LGD
class has an equal weight in the sense that , and . Therefore,
Finally, we state the expressions for and assuming an exposure- weighted calculation:
with In both cases the equations for the slopes in
a line segment , allow an interpretation as a fraction of the -th LGD class to the exposure-weighted portfolio mean.
In this paper, we have developed a new measure to evaluate LGD model performance. The measure, which we term the Power Ratio, is a counterpart of the Accuracy Ratio known from PD modeling, and accounts for concentrations in the LGD distribution. Since the measure is model independent, it has universal applicability. This means that it can be applied likewise to Through-The-Cycle and Point-In-Time models. After presenting the background of the new measure, we derived its analytical properties. Finally we have focused on practical issues and stated alternatives for its explicit calculation from a banking perspective. We see two main fields of application: Firstly, the new measure must be regarded as an extension of existing validation tools. It will support banks to achieve a more multifaceted model assessment and finally help practitioners to validate their models more accurately. Secondly, the new tool will also help to assess the quality of new models proposed for LGD modeling.
The authors thank the referees for a careful reading of the manuscript and the constructive comments.