Measurement of market risk that arises from movements in stock prices, interest rates, exchange rates and commodity prices is a focal point in the practice of financial risk management. This measurement relies heavily on the use of statistical financial models. These models attempt to capture the stylized facts that determine price fluctuations and sensitivities in financial markets. The recent COVID-19 pandemic fears, uncertainties and related confinement measures accelerated an unprecedented contraction in economic activities. The global financial markets reacted to the pandemic with a plunge in stock market indices and extreme volatility in equity prices due to panic sell-outs of equities out of fear. For example, on 12th March 2020 Dow Jones plunged 10%, experiencing its sharpest decline since the Black Monday crash in 1987. Also the S&P 500 index plunged 9% to close in the bear market territory, thus officially ending the bull market that began in 2009 during the throes of the financial crisis. The Asia-Pacific stock markets closed down (with the Nikkei 225 of the Tokyo Stock Exchange also falling to more than 20% below its 52-week high) and European stock markets closed 11% lower in their worst one-day decline in history on coronavirus fears1. At the same time, the economic turmoil associated with the COVID-19 pandemic has also had wide-ranging and severe impacts upon other sectors of the financial markets, including bond and commodity (including crude oil and gold) markets. The collapse of crude oil prices was one of the biggest price shocks the energy market has ever experienced since the first oil shock of 1973. The effects upon markets are part of the coronavirus recession and among the many economic impacts of the pandemic.
One of the most established and widely used standard measures of exposure to market risk is the Value at Risk (VaR). It calculates the worst loss that might be expected of an asset or portfolio of assets at a given confidence level over a given period under normal market conditions. It gives a fixed probability (or confidence level) that any losses suffered by an asset or portfolio over the holding period will be less than the limit established by VaR. VaR can be derived as a quantile of an unconditional distribution of financial returns, but it is preferable to model VaR as the conditional quantile so that it captures the time-varying volatility inherent to financial markets . The popularity of VaR as a risk measure can be attributed to its ability to provide an aggregate measure of risk; a single number that is related to the maximum loss that might be incurred on a position, at a given confidence level.
To estimate market risk measures, several methodologies have been developed; the non-parametric approach (for example, historical simulation), the fully parametric approach (for example, based on an econometric model) and the semi-parametric method (for example, extreme value theory, filtered historical simulation and CAViaR method). However, over the last decade, conventional VaR models have been subject to massive criticism, as they failed to predict the huge repetitive losses that devastated financial markets during the 2007-2008 global financial crisis. These VaR models normally assume that the asset returns follow a normal distribution and thus ignoring the fat-tailed properties of actual returns. This underestimates the likelihood of extreme price movements. Thus, VaR may ignore important information regarding the tails of the underlying distributions. On the other hand, the concept of VaR as a risk measure has one disadvantage for measuring extreme price movements. The tail risk of VaR emerges since it measures only a single quantile of the return distributions and disregards any loss beyond the VaR level. Hence, when financial markets experience volatile behaviour and extreme price movements, the conventional methods are not capable of appropriately measuring the risk. Since VaR estimations are only related to the tails of a probability distribution, techniques from Extreme Value Theory (EVT) may be particularly effective. Therefore, emphasis and focus are now placed on adequate modelling of extreme quantiles for the conditional distribution of financial returns rather than the entire distribution. EVT has been proved to be successful in VaR estimation .
The Extreme Value Theory approach focuses on limiting the distribution of extreme returns observed over a long period, which is essentially independent of the distribution of the returns themselves. There two main models for Extreme Value Theory; the Block Maxima (BM) and the Peaks-Over-Threshold (POT) model. In the POT model, extreme values above a high threshold are analysed using a Generalized Pareto Distribution (GPD). A difficulty finding the optimal threshold for GPD fitting is encountered in this method. Threshold choice involves balancing bias and variance. The threshold must be sufficiently high to ensure that asymptotic underlying the GPD approximations is reliable, thus reducing bias. However, the reduced sample size for high thresholds increases the variance of the parameter estimates .
EVT provides a theoretical framework and practical foundation for statistical models describing the tail behaviour of extreme observations. Unlike the GARCH-family models, EVT models do not consider the entire conditional distribution of financial returns. Instead, they focus directly on the tails of the sample distribution in order to account for the heavy tails. Therefore, EVT potentially performs better than other approaches in terms of predicting unexpected extreme price changes, especially in volatile financial markets. However, applying EVT directly to return series is inappropriate as the current volatility background is not taken into consideration and the models deal with independently and identically distributed (i.i.d.) random variables. To overcome this shortcoming, McNeil and Frey  introduced a two-stage hybrid approach that combines the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model and Extreme Value Theory (EVT), referred to as Conditional Extreme Value Theory (CEVT) model or GARCH-EVT model, which addresses the fat tails phenomenon and stochastic volatility. The GARCH-EVT model captures the important stylized facts exhibited by most financial time series, such as stochastic volatility, volatility clustering and leptokurtosis of conditional return distributions, and quickly adapts to recent market movements.
In econometrics and finance, implementing risk measurement methodology based on the theory of extremes is an important area of research. To this day, many researchers have investigated the estimation of Value at Risk (VaR) and Conditional VaR (CVaR), with the help of Extreme Value Theory. Most studies have VaR as their primary measure of interest. McNeil and Frey  showed that the application of combined GARCH and EVT results in a more accurate estimation of Value at Risk as compared with EVT methods and GARCH-type models. Several researchers have used the McNeil and Frey  approach in estimating market risk. Fernandez  showed that EVT outperforms a GARCH model with normal innovations by far and that it provides similar results to a GARCH model with Student-t innovations, as long as these innovations arise from an asymmetric and fat-tailed distribution. Gencay and Selcuk  showed that at the 99th and higher quantiles the Generalized Pareto Distribution model is superior to five other methods used in the study in terms of VaR forecasting.    and  among others, have demonstrated that the methods for estimating VaR based on modelling extreme observations measure financial risk more accurately compared to the conventional approaches.
In recent studies in finance VaR is be estimated more accurately using the conditional-EVT approach especially in modelling the distribution of extreme events and estimating extreme tail risks than the conventional models. Moreover, in some other studies, the conditional Value-at-Risk has been used as the risk measure, and researchers have shown that theoretically and empirically, using EVT contributes to a more precise estimation of the Value-at-Risk. Researchers have also compared the Extreme Value Theory models with other conventional methods, such as Historical Simulation (HS), Filtered Historical Simulation (FHS) and the GARCH models, in the estimation of Value at Risk, and have shown that conditional EVT models perform better.
Among the many studies on estimating the Value-at-Risk on the financial market with the conditional-EVT model, is the work of Soltane et al.  that combines Extreme Value Theory (EVT) and GARCH model to estimate VaR for the Tunisian Stock Market. They observe that GARCH-EVT-based VaR approach appears more effective and realistic than conventional methods. Singh et al.  applied univariate extreme value theory to model extreme market risk for the ASX-All Ordinaries (Australian) index and the S&P-500 (USA) Index. Results from backtesting showed that conditional-EVT based dynamic approach outperforms GARCH(1, 1) model and RiskMetrics in estimating VaR forecasts. Karmakar and Shukla  investigated the relative performance of Value-at-Risk (VaR) models using daily share price index data from six different countries across Asia, Europe and the United States. The empirical results showed the superior performance of Conditional EVT model in estimating and forecasting VaR measures compared with other competing models. Totić and Božović  modelled the tail behaviour of returns using conditional Extreme Value Theory and results from backtesting showed that a conditional EVT-based model provides a more reliable VaR forecasts than the alternative models in all the six markets. Zhang and Zhang  also showed that the exponential GARCH (EGARCH) model with Generalized Error Distribution (GED) combined with the EVT approach does very well in predicting a critical loss for precious metal markets. Omari et al.  used the conditional extreme value theory to estimate Value at Risk of daily currency exchange rates and their results show that the conditional EVT-based model provides a more accurate out-of-sample VaR forecasts in estimating the currency tail risk. Stoyanov et al.  also investigated the out-of-sample behaviour of 41 equity market indices using the GARCH-EVT model and empirical results show that GARCH-EVT model performed better in estimating Value-at-Risk and Expected shortfall at 1% tail probability. Tabasi et al.  showed that the GARCH-EVT model outperforms the simple GARCH model with Student’s t and normal distributions for residuals.
The application of GARCH-EVT model in empirical research requires the selection of an appropriate threshold level which separates tails of distribution from its middle part. The choice of threshold level is ambiguous, but critical in the estimation of Generalized Pareto distribution’s parameters and the corresponding accuracy of value-at-risk. The standard practice is to adopt as low a threshold as possible, but there is a trade-off between variance and bias. If a threshold is too low, the asymptotic basis of the model is violated leading to high bias. However, too high a threshold generates insufficient excesses with which the model is estimated, leading to high variance. Most authors preferred to select a threshold as a fixed quantile of the data set, instead of determining a threshold value at each step, especially when they use a moving window of observation to find out-of-sample VaR estimates.   and  chose either the 90th or 95th quantiles of the loss distribution as a threshold. In contrast,    used a less conservative but fixed thresholds.
Recent advances in the financial econometrics and applications of extreme value theory to finance have led to the development of several models. These models are used in estimating and forecasting tail-related quantities of the asset returns’ conditional distribution. Some of the recent developments include; Huang et al.  who proposed a new approach to extreme value modelling for the forecasting of Value-at-Risk (VaR). In particular, both the block maxima and the peaks-over-threshold methods are generalized to exchangeable random sequences that cater for the dependencies such as serial autocorrelations of financial returns observed empirically. The results of VaR forecasts show that the conditional GARCH-EVT model performs better compared to the unconditional extreme value theory (EVT) approach.  proposed a new self-exciting probability peaks-over-threshold (SEP-POT) model for forecasting extreme loss probability and the value at risk. The results from backtesting of SEP-POT value at risk (VaR) forecasts on seven stock indices favoured the SEP-POT model as an alternative for forecasting extreme quantiles of financial returns. There are several reviews on applications of conditional EVT in finance for instance  . Echaust and Just  used four different optimal tail selection algorithms, that is, the path stability method, the automated Eye-Ball method, the minimization of asymptotic mean squared error method and the distance metric method with a mean absolute penalty function, to estimate out-of-sample Value at Risk (VaR) forecasts and compare them to the fixed threshold approach.
In this study, the conditional VaR at a 1-day horizon is estimated based on conditional Extreme Value Theory (conditional-EVT) approach and conventional GARCH-type models assuming asymmetric innovations distributions. We take into account volatility clustering and leverage effects in return volatility by using the GARCH, EGARCH, GJRGARCH, CSGARCH and APARCH models under different probability distributions assumed for the standardized innovations: Gaussian, Student-t, skewed Student-t and generalized error distribution. The two-step procedure of  fits a generalized Pareto distribution to the extreme values of the standardized residuals generated by an AR(1)-EGARCH(1, 1) model. Then, we compare the out-of-sample one-step-ahead value at risk (VaR) forecasts the performance of all these models before and during the COVID-2019 pandemic period using daily data. For VaR evaluation, the most widely used backtesting procedures, Unconditional Coverage (UC) and Conditional Coverage (CC) tests are used. The empirical analysis is based on the daily log-returns of twelve international stock market indices for the period between January 2006-July 2020 (that is, S&P 500 (US; SPX), FTSE 100 (UK; FTSE), DAX 30 (Germany; GDAXI), CAC 40 (France; FCHI), SMI (Switzerland; SMI), Euro Stoxx 50 (Europe; STOXX 50), S&P/TSX Composite (Canada; GSPTSE), NIKKEI 225 (Japan; N225), KOSPI 200 (South Korea; KS11), Hang Seng (Hong Kong; HSI), Shanghai Composite (China, SSE), Sensex (India, BSESN)). The daily log returns for the equity market were calculated from the adjusted daily closing prices downloaded from https://markets.businessinsider.com/indices.
This work provides an empirical study of conditional extreme value theory and contributes to the literature on the estimation of the tail risk of stock markets in four ways. First, several GARCH-type volatility specifications in an EVT model to take into account volatility clustering and asymmetric returns are used. Secondly, conditional EVT models that incorporate conditional models with asymmetric probability distributions used in the financial literature to calculate VaR are compared. Thirdly, VaR over the 1-day horizon for market risk management is calculated. Finally, we focus on the accuracy of our risk models for VaR estimation during pre-pandemic and during pandemic periods as well as using different significance levels. The empirical results indicate that the conditional EVT based models consistently produce a better 1-day VaR performance compared with conditional models with asymmetric probability distributions for return innovations and maybe a better option in the estimation of VaR.
The rest of the paper is organized as follows. Section 2 provides the methodological details. It presents the estimation of the GARCH models with the selected innovations distribution assumptions, conditional GARCH-EVT modelling framework including the Peaks Over Threshold (POT) model, the Quasi Maximum Likelihood (QML) and describes the tail selection problem in this model, VaR estimation and backtesting procedures. Section 3 presents data and some preliminary summary descriptive statistics. Section 4 presents estimation results and empirical results from backtesting, while Section 5 gives a conclusion the paper.
2.1. Value at Risk
Value-at-Risk (VaR) is a popular approach to measuring market risk. It is defined as the maximum loss that will be incurred on an asset with a given level of confidence over a specified period under normal market conditions. Given some confidence level , the VaR at a confidence level p is given by the smallest number l such that the probability that the expected loss L exceeds l is no larger than . That is,
Let denote the return on assets at time t. The one-day-ahead Value-at-Risk (VaR) for holding a long trading position at p level of significance, denoted as is defined as
where is the information set available at time t. In this definition VaR is the pth conditional quantile of the return distribution. For a short trading position VaR is the (1 − p)th conditional quantile of the return distribution.
2.2. The GARCH Model
The Generalized Autoregressive Conditional Heteroskedasticity models are the most commonly used in the literature for modelling volatility and estimating Value-at-Risk. Let be the percentage log-returns of the financial asset (stock index) of interest at time t. Then
where denotes the conditional mean, a conditional volatility process and is a zero-mean white noise. The mean component of daily log returns is assumed to be represented by an AR(1) model. The GARCH-type models are used in modelling conditional volatility dynamics in log-returns of financial time series. There are several representations of common GARCH-type models but we consider the ones that follow the above specification in Equation (4); however, in each case, the volatility process is different. For brevity, all of the models will be restricted to a maximum order of one. In addition, for each GARCH-type model, the innovation process is allowed to follow either the normal distribution or Student’s-t distribution or skew Student’s-t distribution or skew generalized error distribution.
The standard GARCH(1, 1) model introduced by Bollerslev  is given as:
where , , and . The GARCH(1, 1) model which is the most commonly used in financial literature and the main feature of the model is that it captures volatility clustering in the data. The “persistence’’ parameter (which accounts for the amount of volatility clustering captured by the model) for this model is . The parameter restrictions are necessary for the model in Equation (5) to be weakly stationary and the unconditional variance is given by , thus higher order moments exist.
Since financial returns tend to display leverage effects, which is the negative correlation between returns and its volatility, the asymmetric GARCH models are introduced to address the problem.
The exponential GARCH (EGARCH) model of Nelson  is defined as:
where the coefficient captures the sign effect, and the size of leverage effect. The persistence parameter for this model is .
The Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model of Glosten et al.  models positive and negative shocks on the conditional variance asymmetrically via the use of the indicator function I. The GJR-GARCH(1, 1) model is given as:
where now represents the “leverage” term. The indicator function I takes on value of 1 for and 0 otherwise. The persistence depends on the parameter , through , where denotes the expected value of the standardized residuals.
The asymmetric power ARCH (APARCH) model of Ding et al.  allows for both leverage and the Taylor effect, named after Taylor  who observed that the sample autocorrelation of absolute returns was usually larger than that of squared returns. The APARCH(1, 1) model can be expressed as:
where , is a Box-Cox transformation of , and is the coefficient in the leverage term. The persistence parameter is equal to , where is the expected value of the standardized residuals under the Box-Cox transformation of the term, which includes the leverage parameter .
The component standard GARCH (CS-GARCH) model of Engle and Lee  decomposes the component of the conditional variance so as to investigate the long and short-run movements of volatility. Let represent the permanent component of the conditional variance, the component model can be written as
where effectively the intercept of the GARCH model is now time-varying following first order autoregressive type dynamics.
For a better fit of the GARCH models, the standardised Student’s-t distribution, skewed Student’s-t distribution and Generalized Error Distributions (GED) are instead of the normal distribution, since returns exhibit fat tails and skewness. The standardized Student’s t-distribution is given by:
with degrees of freedom parameter , controlling the thickness of the tail, .
An alternative distribution for modelling skewed and heavy-tailed data is the skewed Student’s t-distribution proposed by Hansen . The distribution as in Zhu and Galbraith  is given by
This parametrization of the distribution is equivalent to those of  and .
The Generalised Error Distribution (GED) is given by
Using the Quasi-Maximum Likelihood (QML), the parameters may be estimated simultaneously by maximizing the log likelihood. The log-likelihood function is obtained under the assumption that the random error term follows the standardized Student’s t-distribution is given by
where is the unknown parameters in GARCH-type models to be estimated. Solving the first-order conditions of the log-likelihood function with respect to the parameters , a specified optimal GARCH-type model is obtained. The standardized residuals of the fitted GARCH-type model can also be extracted. Next, the forecasts of the conditional mean and variance are obtained using the estimated parameters from QML above. To this extent, one-step ahead forecasts of the conditional variance of returns are recursively obtained as .
The one-step-ahead conditional variance forecast for the GARCH(1, 1), EGARCH(1, 1) GJR-GARCH(1, 1), APARCH(1, 1) and CS-GARCH(1, 1) respectively, is:
The VaR forecast for the GARCH-type models rely on the one-day-ahead conditional variance forecast, of the volatility model. For each GARCH-type model, under the assumption of different error distribution, the one-day-ahead VaR forecast at p% confidence level is obtained as:
where is pth quantile of the cumulative distribution function of the innovations distribution.
2.3. Modelling Tails Using Extreme Value Theory
Extreme Value Theory primarily focuses on analysing the asymptotic behaviour of extreme values of a random variable. The theory provides robust statistical tools for estimating only extreme values distribution instead of the whole distribution. There are two main approaches in applying EVT; the Block Maxima (BM) model and the Peaks-Over-Threshold (POT) model. The approaches rely on different references to determine the extreme values. The BM model selects the maximum value given a specified period or block while the POT model focusses on the observations exceeding some pre-specified high threshold. Modelling the maximum of a block of random variables is considered wasteful if other data on extreme values are available. Therefore, a more efficient approach to modelling extreme events is to focus not only the largest (maximum) events, but also on all events greater than some large preset threshold. This is the Peaks Over Threshold (POT) modelling. The POT models are generally considered to be more appropriate in practical applications, due to their efficient use of data at the extreme values.
In this study, the POT approach to model extreme events is adopted. The POT method specifies the observations above the chosen threshold as extreme values and focusses on the “exceedance’’ part to estimate parameters of the tail distribution rather than the entire data set. Let be a sequence of of independent and identically distributed (i.i.d.) random variables, with common distribution function F. The POT model approach focuses on estimating the distribution function of values of x above a high threshold u. The distribution of excesses over a high threshold u is defined as:
where is the right endpoint of F.
As in Balkema and de Haan  and Pickands , for a large class of underlying distributions functions F the conditional excess distribution function , for a large u, is well approximated by with . That is
where , the Generalized Pareto Distribution (GPD), given by
where for and for . The distribution has a scale, and shape parameter . It subsumes a number of other specific distributions under its parametrization. When , then G is a parameterized version of a heavy tailed ordinary Pareto distribution; when corresponds to the light tailed exponential distribution and when we have a short tailed Pareto type II distribution. In financial risk management, is generally chosen as the most relevant for analysis purposes since the GPD tends to describe heavy tails. Estimates of the parameters and can be obtained using the method of maximum likelihood . For , Hosking and Wallis  present evidence that maximum likelihood regularity conditions are fulfilled and the maximum likelihood estimates are asymptotically normally distributed.
By setting , an approximation of , for , can be obtained from Equation (16):
The function can be estimated non-parametrically using the empirical c.d.f:
where represents the number of exceedances over the threshold u and n is the sample. By substituting Equations (18) and (20) into Equation (19), an estimate for is obtained as follows:
where and are estimates of and , respectively, which can be estimated by the method of maximum likelihood.
For , can be obtained from Equation (21) by solving for x;
where u is a threshold, is the estimated scale parameter, is the estimated shape parameter.
One of the challenging problems in practical application of POT-method is setting the appropriate threshold. Single threshold selection involves a bias-variance trade-off. An excessively low threshold may violate the asymptotic underlying the GPD approximation and, consequently, increase the bias. Conversely, an excessively high threshold may involve a smaller sample size and generate few excesses, leading to high variance in the parameter estimations. It is thus of importance of finding a good balance in setting the threshold to find a suitable balance between the variance and the bias of the model. In this paper, a quantile rule using an upper threshold of 10% (the 90th percentile) for setting the threshold value is adopted. This is a common practice.
2.4. Conditional Extreme Value Theory Model
The GARCH-EVT model introduced by McNeil and Frey  is used to estimate Value at Risk by extending the EVT framework to dependent data. To utilize EVT, an important assumption is for the data to be independently and identically distributed (i.i.d.). The EVT is used to model the tails of standardized residues obtained from the GARCH-type model. First, the GARCH model is fitted to the financial return series to filter the serial autocorrelation and obtain close to independently and identically distributed standardized residuals. Subsequently, the standardized residuals are fitted using the POT-EVT framework. The GARCH-EVT approach is summarized as follows:
Fit a suitable GARCH-type model to the return data by quasi maximum likelihood. That is, maximize the log-likelihood function of the sample assuming the standardized Student’s t-distributed innovations. Estimate and from the fitted model and extract the standardized residuals .
Consider the standardized residuals computed in Step 1 to be realizations of a white noise process. Apply EVT to model the tails of the innovations and estimate for a given probability q.
Hence, the standardized residuals can be computed as a white noise process . Given the 1-step forecasts , and standardized residuals series, the conditional value-at-risk, can be estimated as follows as:
with obtained from Equation (22).
is computed during the out-of-sample period along with the parameter estimates by using the previous in-sample observations n returns. That is, with t in the set is calculated with returns . This implementation is rolled forward for each day, which effectively captures time-varying characteristics. Besides, the 90th percentile of the return distribution is set as the threshold, so k equals 10% of the daily observations. Since the backtesting period is relatively long, the threshold value is set at the 90th quantile in order to simplify the procedures. The advantage of this combination lies in its ability to capture conditional heteroscedasticity in the data through the GARCH framework, while at the same time modelling the extreme tail behaviour through the EVT method.
2.5. Backtesting the VaR Models
The adequacy of models used for estimating VaR forecasts can be statistically tested using the backtesting procedure. This procedure consists of comparing the out-of-sample VaR estimates with actual realized loss in the next period. An accurate measure guarantees that the actual return will only be worse than the VaR forecasts of the time. Given a time series of past ex-ante VaR forecasts and past ex-post returns, one can define the “hit sequence’’ (also referred to as indicator function) of VaR violations as:
The hit sequence returns a value of 1 on day if the ex-post loss on that day exceeds the VaR number predicted in advance for that day and value zero otherwise. When performing backtesting on VaR models, a hit sequence is created across T days indicating when the past violations occurred. For a VaR model to be accurate in its predictions, then the average hit ratio or the failure rate over the full sample should be equal for the quantile VaR (that is, for 95% VaR, ). The violations involve counting the number of actual realized returns that exceed the VaR forecast and comparing this number with the expected number of violations. As expected, the closer the hit ratio is to the expected value, the better the forecasts of the risk model. If the hit ratio is greater than the expectation, then the model underestimates the risk; with a hit ratio smaller than the expected value, the model overestimates risk.
In this study, the commonly used backtesting procedures; the Kupiec’s proportion of failures test (also known as the unconditional coverage test) and Christoffersen’s test (also known as the conditional coverage test) are used.
2.5.1. Unconditional Coverage Test
According to Kupiec  test, the interest is to check if the proportion of violations obtained from VaR models, call it , is significantly different from the expected proportion, p. This is called the unconditional coverage hypothesis. Assuming that the probability of obtaining an exceedance is constant, the number of VaR violations by actual returns, follows a binomial distribution , where T represents the total number of observations. An accurate measure should produce an unconditional coverage equal to p percent. The unconditional coverage test has a null hypothesis that the probability of failure for each trial ( ) should be equals to (p), that is, .
The likelihood ratio statistic,
is used to perform this test. When the null hypothesis is true, the statistic has an asymptotic Chi-square distribution with one degree of freedom. The advantage of this test is that it assesses the adequacy of the model taking into account either too large and too small number of exceedances. A good model used for VaR estimation should also be characterized by independence of exceedances.
2.5.2. Conditional Coverage Test
Christoffersen  proposed a conditional coverage test procedure that jointly examines the correct unconditional coverage and serial independence. The procedure is a joint test of these two properties and the corresponding test statistic is the sum of the individual test statistics for the properties; that is, when conditioned on the first observation. The denotes the likelihood ratio statistic that tests whether exceptions are independent, and the is defined in the previous subsection. When the model accurately estimates VaR, then a present exceedance should not depend on whether or not an exceedance occurred on the previous day.
According to this test, the hit sequence is assumed to be dependent over time and that it can be described as a first-order Markov sequence with a transition probability matrix given by
where . These transition probabilities simply mean that conditional on today being a non-violation (that is, ), then the probability of tomorrow being a violation (that is, ) is . The probability of tomorrow being a violation given today is also a violation is: . The Markov chain reflects the existence of an order one memory in the sequence of exceedances. In the case of the hit sequence being independent over time, then the probability of a violation tomorrow does not depend on today being a violation or not. In this case, the null hypothesis in the independence test is
Ultimately, one is interested in simultaneously testing if the VaR violations are independent and the average number of violations is correct. The conditional coverage test jointly examines whether the percentage of exceptions is statistically equal to the one expected ( ) and the serial independence of the exception indicator. In this test, the null hypothesis takes the form:
Thus, under the null hypothesis of the expected proportion of exceptions equals p and the failure process is independent, the appropriate likelihood ratio test statistic is of the form:
where , denote the number of days when condition j occurred assuming that condition i occurred on the previous day (1 if exceedance occurs, 0 if no exceedance occurs).
Under the null hypothesis the likelihood ratio statistic, , has an asymptotically Chi-square distribution, with two degree of freedom. The Christoffersen’s test enables the use to test both coverage and independence hypotheses at the same time. Moreover, it checks if the VaR model fails a test of both hypotheses combined. This approach makes us enable to test each hypothesis separately, and therefore establish where the model failure arises.
In this study, twelve major international stock indices in the world are analysed. The set include: S&P500 (US; SPX), FTSE 100 (UK; FTSE), DAX 30 (Germany; GDAXI), CAC 40 (France; FCHI), Swiss Market Index (Switzerland; SMI), Euro Stoxx 50 (Europe; STOXX 50), S&P/TSX Composite (Canada; GSPTSE), Nikkei 225 (Japan; N225), KOSPI 200 (South Korea; KS11), Hang Seng (Hong Kong; HSI), Shanghai Composite (China, SSE) and Sensex (India, BSESN). The motivation for selection of these stock indices is to examine the reliability of the proposed VaR forecast models for the major world stock indices in periods of financial distress and turmoil. Therefore, the data covers the period for 1st January 2006 to 31st July 2020, covering the 2008 global financial crisis, the 2011 European financial crisis and the current COVID-19 pandemic period. Each price series is expressed in the local currency. The daily percentage log-returns on assets over the sample period are used. Daily log-returns are computed as 100 times the difference of the log prices, i.e. , where is the adjusted closing price (value) on day t. The daily log returns for the equity market were calculated from the adjusted daily closing prices downloaded from https://markets.businessinsider.com/indices.
Figure 1 presents the time plots of the log-return series which shows evidence of volatility clustering and extreme price movements in the returns. From the
Figure 1. Daily log-return plots of the twelve major stock market indices for the period starting from January 1, 2006 to July 31, 2020.
figure, we can also see the effects of the 2008 global financial crisis, the 2011 European financial crisis as well as the 2019 COVID pandemic shocks in March 2020. All the ten stock market indices display similar patterns of volatility clustering dynamics over time and extreme price jumps. Table 1 shows the summary statistics and statistical test results computed over the in-sample, out-of-sample and full sample periods for all stock market indices considered in this paper. All the stock market indices record a positive mean close to zero except for CAC40, EURO and N225 in the in-sample, FTSE in the out-of-sample and EURO again in the full sample that have a negative mean. The log-return series for each stock market index are far from being normally distributed as indicated by their negative skewness and high excess-kurtosis. The Jarque-Bera normality test also confirms that all stock market indices are non-normally distributed. The Augmented Dickey-Fuller (ADF) results further show that all series are stationary. The Ljung-Box Q statistic tests the null hypothesis of no serial correlation and is calculated using up to 5 lags. A significant Q statistic for returns implies that we reject the null hypothesis of no serial correlation in returns, while a significant Q statistic for the squared return series implies that the null hypothesis of homoscedastic returns is rejected. From the results, it is observed that the Ljung-Box Q statistics are significant for most returns as well as squared return series. Thus, Ljung-Box tests confirm presence of serial correlation in squared returns series. Again, the null hypothesis of no ARCH effects is rejected, by the Lagrange multiplier test for AutoRegressive Conditional Heteroscedasticity (ARCH-LM test) thus confirming presence of ARCH effects in all return series. The presence of serial correlation supports the need to filter the heteroscedasticity in all series using an appropriate conditional heteroscedastic model.
Table 1. Descriptive summary statistics and statistical tests results for the log-returns (%) of twelve stock market indices for in-sample, out-of-sample and full sample period.
JB is the test statistic of Jarque-Bera test, ADF is the test statistic of Augmented Dickey-Fuller test, Q(5) is the test statistic of Ljung-Box test on returns series, Q2(5) is the test statistic of Ljung-Box test on squared returns series and LM is the test statistic of Lagrange multiplier test for autoregressive conditional heteroscedasticity (ARCH-LM test), with * stands for significance at 0.01%.
4. Empirical Results
4.1. In-Sample Analysis
First an in-sample analysis is considered, where the GARCH-type models are fitted to the in-sample data. An approximately 70% log-returns (in percent) are used for the estimation and run the backtest over 1000 (about 4 years) out-of-sample log returns for the period from August, 01, 2016 to July, 31, 2020 (the full data set starts on January, 01, 2006). Each model is estimated on a rolling window basis and both the density and one-step-ahead log-returns forecasts are obtained. The model parameters are updated every 20th (monthly) observations. This frequency was selected in order to speed up the computations. Similar results were obtained for a subset of stocks when the parameters were updated every day. This is also in line with the observations of , who noted that in the context of GARCH models, that the performance of VaR forecasts is not affected significantly when moving a daily updating frequency to a weekly or monthly updating frequency. It is important to note that, while the parameters are updated every 20 observations, the density and downside risk measures are computed every day.
As we are interested in the volatility dynamics, as a first step we de-mean the stock indices return series and remove autoregressive effects in the data using a first order autoregressive model, AR(1)-filter and estimate the models on the residuals. As noted, the log-returns are skewed and leptokurtic. Thus, to account for the excess kurtosis, skewness and the dynamics of fluctuations typical in the financial time series data, we consider different error distributions including Student’s-t distribution, skewed Student’s-t distribution and Generalized error distribution (GED). The standard AR(1)-GARCH(1, 1) model specification is fitted to all return series with the different error distributions. Table 2 presents the AIC and BIC values for fitted AR(1)-GARCH(1, 1) model with different error distributions. Overall the skewed Student’s t distribution and Generalized error distributions fits well majority of stock market indices returns. The skewed Student’s-t distribution accounts for the excess skewness and kurtosis typical of financial time series data.
Table 2. Criterion for selecting the appropriate error distribution for the log returns.
For brevity, the AR(1)-SGARCH(1, 1), AR(1)-EGARCH(1, 1), AR(1)-GJRGARCH(1, 1), AR(1)-APARCH(1, 1) and AR(1)-CSGARCH(1, 1) models are used to filter conditional volatilities in all return series and estimate the out-of-sample VaR forecasts. Table 3 reports the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values of all the GARCH-type models considered in the study with error distributions; skewed Student’s t-distribution and generalized error distribution. The AR(1)-EGARCH(1, 1) model with skewed Student’s t distribution is selected as the optimal GARCH specification with the smallest AIC and BIC values for modelling volatility dynamics in all stock indices returns. According to the results reported in Table 4, the expected daily index return is not statistically different from zero for all stock return indices. However, the index for FTSE, CAC, SMI and EURO give a positive return. The estimated mean return for S&P, CAC, N225 and BSESN are statistically significant at 5% level. Volatility persistence in the return series for each of the stock indices is described by both and terms as reported in Panel B of Table 4. The parameter represents the lagged squared residuals, while is the lagged conditional variance term in the EGARCH model. Volatility is said to be persistent if the sum of the two volatility terms is close to unity, less persistent if less than unity and explosive if greater than unity. The coefficients for all the stock market indices are significant at 5% level of significance in favour of the EGARCH(1, 1) model. The implication of these results is that, all the stock indices returns show no evidence of long-memory in their respective return series. This means that shocks to volatility tend to decay quickly, implying that positive volatility do not have a strong predictive power on current volatility. These results are, however, conditional on the model specification and the distribution assumption made in the estimation. A number of empirical research in stock market returns distribution points to significant leverage effect, where higher volatility tends to follow negative returns. Asymmetries in the distribution of returns may arise as a result of shocks due to systemic risk factors that affect the cross-section of returns, or because of country-specific shocks. The results reveals significant positive leverage effect for all return series. Similarly, the parameters of skew and shape are statistically significant for all the stock indices series.
Panel C of Table 4 reports Ljung-Box test results for standardized residual series and squared standardized residual series as well as Lagrange multiplier tests for autoregressive conditional heteroscedasticity (ARCH-LM test) on the residuals. The Ljung Box results on standardized residuals up to 5 lags are significant for all stock residuals except for SMI and SSE at 5% level. For the squared residuals the results are also significant for all stock indices except for KOSPI and HSI. The ARCH-LM test confirms that no ARCH effects are present in the standardized residuals of most stock indices except for KOSPI, HSI and BSESN. Therefore the AR(1)-EGARCH(1, 1) model sufficiently filters the serial autocorrelation and conditional volatility dynamics present in stock indices returns effectively producing standardized residuals that are closer to being independently and identically distributed (i.i.d.) compared to the original log-return series. However, the fitted GARCH-type models fails to capture extreme observations experienced in the stock markets.
In the next step, the GARCH-EVT model is utilized in estimating VaR forecasts. The standardized residuals from the fitted AR(1)-EGARCH(1, 1) model are approximately i.i.d. which is a standard requirement for extreme value theory to be applied in modelling extreme observations. The Peak over threshold (POT) approach is used to model the tail behaviour of standardized residuals of stock market indices returns. A threshold value is set at 90% quantile of the in-sample observations to estimate the parameters of the generalized Pareto distribution (GPD). Table 5 reports threshold values, number of exceedances and parameter estimates of the fitted GPD with their corresponding standard errors enclosed in brackets. The shape parameter ( ) is positive and significantly different from zero except for S&P500, FTSE, S&PTX, KOSPI, HSI and SSE indicating heavy-tailed distributions and a finite variance. This also implies that tail distributions of stock market indices belong to Frechet class which is heavy-tailed. The scale parameter is positive and significant for all the stock market indices.
Table 3. AIC and BIC values for the fitted GARCH-type models for stock market indices returns.
Table 4. AR(1)-EGARCH(1, 1)-sstd estimation results for stock market indices returns.
*Significant at 5% level of significance and p-values are shown in parentheses.
Table 5. AR(1)-EGARCH(1, 1)-EVT estimation results for stock market indices returns.
4.2. Out-of-Sample Analysis
We now turn to an out-of-sample analysis where we compare the ability of the conditional EVT and GARCH-type models to correctly forecast the one-day ahead Value-at-Risk (VaR). We use out-of-sample data for backtesting; thus we have an in-sample of the return observations for the rolling window estimation procedure, containing the 2008 global financial crisis period, and we run the backtest over 1000 out-of-sample observations for a period starting from 1st June 2017, to 31st July 2020. VaR forecasts are also estimated following a rolling-window approach. The out-of-sample data is further divided into blocks of 500 and 1000 trading days to observe how the models behave for both shorter and longer periods of observation. To test the ability of our models to capture the true VaR, we compare the realization of the returns with the one-day ahead VaR forecasts at 95% and 99% risk levels. To that aim, we adopt the UC test of  and the CC test of  to evaluate the accuracy of each of the 5 models considered in terms of predicting accurate VaR forecasts at the 5% and 1% levels for all daily returns on the 11 stock indices. Table 6 presents the out-of-sample VaR Violations and p-values of the Unconditional Coverage (UC) test and Table 7 presents the one-day ahead VaR backtesting results computed using 1000 and 500 out-of-sample observations at the 5% and 1% risk levels.
Table 6 summaries the violation ratio percentages of the underlying VaR models and the p-values corresponding to the unconditional coverage (UC) tests for the 500 and 1000 blocks at 5% and 1% significance levels. The expected out-of-sample VaR violations for the 1000 window is 50 at 95% and 10 for 99% confidence level while for the 500 window is 25 at 95% and 5% for 99% confidence level respectively. Based on the proximity of the actual violation ratio to the expected violation ratio, the GARCH-EVT VaR model perfoms better than the standard GARCH(1, 1), EGARCH(1, 1), GJR-GARCH(1, 1) and APARCH(1, 1) VaR models. According to the UC test results, the VaR forecasts based on the GARCH-EVT model produce a rather accurate out-of-sample proportion of violations the highest number of times. However, GARCH-EVT model rejected the null of correct coverage five times at 5% risk level for the 1000 window. At the 1% risk level, these differences are mostly significant; we obtain p-values close to 1 with the EVT-based model. Overall, the one-day ahead backtesting results demonstrate the superiority of the GARCH-EVT models over the GARCH-type models.
Table 6. Out-of-sample VaR Violations and p-values of the Unconditional Coverage (UC) test on the VaR forecasts.
The table reports the violations ratios and p-values of the unconditional coverage (UC) test for the one-day ahead 5% and 1% VaR. ** stands for the null hypothesis of the Kupiec’s test being rejected under the significance level 5%. In other words, the violations rates are either underestimated or over estimated.
Table 7. Conditional Coverage (CC) tests of Christoffersen (1998) results of backtesting.
The table reports the p-values of the conditional coverage (CC) test of Christoffersen (1998) for the one-day ahead 5% and 1% VaR. ** stands for the null hypothesis of the Christoffersen’s test being rejected under the significance level 5%.
Table 7 presents conditional coverage (CC) test statistic values and p-values in parentheses for the 500 and 1000 windows at two levels of significance of 95% and 99% which are considered to reflect extreme market conditions. For the conditional coverage test, likewise, a good model should accept the null hypothesis, that is, correctly identifying the number of violations and being independent. The results of the CC test checking both the correct coverage and the lack of dependence of order one in VaR violations seem to support the GARCH-EVT model. The conditional EVT model yield the highest success rate for both tests with higher p-values that are statistically significant in most of the cases demonstrating the supremacy of the model over the other competing models. The poorest fit corresponds to the 5% risk level because, the null of proper specification had to be rejected for S&P 500, FTSE 100, CAC, SMI and EURO. As in the case of the UC test, the CC test results demonstrate that the GARCH VaR model rendered the worst fit with the null rejected in three cases at 5% level of significance for both the 500 and 1000 windows.
In general, we observe that conditional EVT-based models give the best one day-ahead VaR forecasts according to the UC and CC backtesting results. Moreover, an EGARCH(1, 1) specification leads to a substantial reduction in the rejection frequencies. A heavy-tailed conditional distribution is of fundamental importance for both the GARCH-EVT and GARCH specifications, and delivers excellent results at both risk levels. Thus, we conclude that it is feasible to discriminate between the estimation methods based on an analysis of the VaR forecast accuracy.
In recent times, VaR has become the most common risk measure used by financial institutions to assess market risk of financial assets. Since VaR models often focus on the behavior of asset returns in the left tail, it is important that the models are calibrated such that they do not underestimate or overestimate the proportion of outliers, as this will have significant effects on the allocation of economic capital for investments. Stock market indices are normally characterized by high volatility and extreme price shocks unlike financial assets such as currencies exchange rates and securities market prices. The GARCH-EVT approach allows us to model the tails of the time-varying conditional return distribution. The conditional extreme value theory has been proved to be one of the most successful in estimating market risk. The implementation of this method in the framework of the POT model requires choosing a threshold return for fitting the generalized Pareto distribution. Threshold choice involves balancing bias and variance. The GARCH-EVT model performs relatively well in estimating the risk for all stock indices. Empirical backtesting results demonstrate that the conditional EVT and the E-GARCH skewed Student’s t models are the most appropriate techniques in measuring and forecasting risk since they outperform the competing conventional methods and are ranked as the top two models in most cases. Backtesting procedures indicate that regardless of the choice of the tail, approximately the same accuracy of VaR prediction is provided. The GARCH-EVT model provides a significant improvement in forecasting value-at-risk over the widely used conventional GARCH models. This study may be extended by considering more robust models such as the MSGARCH-EVT-Copula model that can also capture the structural breaks and dependence structure between stock markets. The measurement of market risk can also be implemented using expected shortfall which is a coherent risk measure. Given that the financial markets are complex, dynamic and dependent on other markets, selection of diversified investment portfolio is another important area for further research.
 Bień-Barkowska, K. (2020) Looking at Extremes without Going to Extremes: A New Self-Exciting Probability Model for Extreme Losses in Financial Markets. Entropy, 22, 789. https://doi.org/10.3390/e22070789
 McNeil, A.J. and Frey, R. (2000) Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach. Journal of Empirical Finance, 7, 271-300. https://doi.org/10.1016/S0927-5398(00)00012-8
 Gencay, R. and Selcuk, F. (2004) Extreme Value Theory and Value-at-Risk: Relative Performance in Emerging Markets. International Journal of Forecasting, 20, 287-303.
 Chan, K.F. and Gray, P. (2006) Using Extreme Value Theory to Measure Value-at-Risk for Daily Electricity Spot Prices. International Journal of Forecasting, 22, 283-300.
 Ghorbel, A. and Trabelsi, A. (2008) Predictive Performance of Conditional Extreme Value Theory in Value-at-Risk Estimation. International Journal of Monetary Economics and Finance, 1, 121-148. https://doi.org/10.1504/IJMEF.2008.019218
 Karmakar, M. and Shukla, G.K. (2015) Managing Extreme Risk in Some Major Stock Markets: An Extreme Value Approach. International Review of Economics & Finance, 35, 1-25. https://doi.org/10.1016/j.iref.2014.09.001
 Omari, C.O., Mwita, P.N. and Waititu, A.G. (2017) Using Conditional Extreme Value Theory to Estimate Value-at-Risk for Daily Currency Exchange Rates. Journal of Mathematical Finance, 7, 846-870. https://doi.org/10.4236/jmf.2017.74045
 Tabasi, H., Yousefi, V., Tamosaitiene, J. and Ghasemi, F. (2019) Estimating Conditional Value at Risk in the Tehran Stock Exchange Based on the Extreme Value Theory Using GARCH Models. Administrative Sciences, 9, 40.
 Omari, C.O., Mwita, P.N. and Gichuhi, A.W. (2018) Currency Portfolio Risk Measurement with Generalized Autoregressive Conditional Heteroscedastic-Extreme Value Theory-Copula Model. Journal of Mathematical Finance, 8, 457-477.
 Cifter, A. (2011) Value-at-Risk Estimation with Wavelet-Based Extreme Value Theory: Evidence from Emerging Markets. Physica A: Statistical Mechanics and Its Applications, 390, 2356-2367. https://doi.org/10.1016/j.physa.2011.02.033
 Huang, C.K., North, D. and Zewotir, T. (2017) Exchangeability, Extreme Returns and Value-at-Risk Forecasts. Physica A: Statistical Mechanics and Its Applications, 477, 204-216. https://doi.org/10.1016/j.physa.2017.02.080
 Glosten, L.R., Jagannathan, R. and Runkle, D.E. (1993) On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 48, 1779-1801.
 Ding, Z., Granger, C.W. and Engle, R.F. (1993) A Long Memory Property of Stock Market Returns and a New Model. Journal of Empirical Finance, 1, 83-106.
 Engle, R.F. and Lee, G. (1999) A Permanent and Transitory Component Model of Stock Return Volatility. Cointegration, Causality and Forecasting: A Festschrift in Honor of Clive W.J. Granger. Oxford University Press, New York.
 Zhu, D. and Galbraith, J.W. (2010) A Generalized Asymmetric Student-t Distribution with Application to Financial Econometrics. Journal of Econometrics, 157, 297-305. https://doi.org/10.1016/j.jeconom.2010.01.013
 Embrechts, P., Resnick, S.I. and Samorodnitsky, G. (1999) Extreme Value Theory as a Risk Management Tool. North American Actuarial Journal, 3, 30-41.
 Ardia, D., Bluteau, K., Boudt, K. and Catania, L. (2018) Forecasting Risk with Markov-Switching GARCH Models: A Large-Scale Performance Study. International Journal of Forecasting, 34, 733-747. https://doi.org/10.1016/j.ijforecast.2018.05.004