1. Introduction and Background
The problem of selecting an appropriate probability distribution for a dataset, if one exists, is as old as Gauss’s development of the “normal” distribution. Many distributions have been developed since the normal was used to describe the distribution of a sample drawn from an unknown population. Important differences between many distributions are the implied restrictions on the higher order moments, such as skewness and kurtosis.
The modern literature of the empirical analysis of fitting alternative probability distributions to financial data begins with Mandelbrot  and Fama  and continuing to this day concludes that distributions of financial asset prices or the corresponding rate of return (returns) data are usually asymmetric and are short and wide compared to the normal pdf. That is, they are skewed and leptokurtic. Therefore, methods that are used to estimate the relationship between assets prices and returns such as ordinary least squares (OLS) regression or those that are based on an assumed symmetric and non-leptokurtic normal pdf may generate inefficient parameter estimates when the errors are not normally distributed. The inappropriate assumption of a symmetric distribution for the error can lead to biased estimates of the intercept if the data are skewed. There is a growing literature on robust estimation and outlier resistant methods that deals with how to address the potential inefficiency and bias problems associated with outliers and non-normality often found in financial data. One approach is to assume a more general pdf that can accommodate asymmetry and thick tails.
Many such pdf’s and outlier resistant methods can be nested using a family of flexible pdf’s whose members are obtained by imposing different restrictions on the distributional parameters with corresponding restrictions on the moments. Some of these generalized pdf’s can be visualized as being at the top of a pyramid of pdf’s obtained by imposing parameter restrictions which imply different restrictions on feasible values of skewness and kurtosis. If a model is over-parameterized (selecting a more general model than necessary), then the estimators will be inefficient. For example, if the data are normally distributed and a five-parameter Skewed Generalized t (SGT) distribution is fit to the data where only two parameters are needed to describe the data, the estimators of the SGT parameters will not be efficient. However, if a symmetric distribution is fit to skewed data, the parameter estimates and implied estimated moments will be biased. By studying the families of flexible distributions and their related characteristics, the researcher will be better able to select a robust estimator having desirable statistical properties. For example, McDonald, Michelfelder, and Theodossiou  showed that if OLS or a robust estimator based on a symmetric error distribution is used to estimate the relationship between asset prices and returns, the estimated intercept will be biased if the errors have a skewed distribution.
This paper presents the relationships between the generalized and the restricted pdf’s associated with several families of pdf’s that are more commonly being used in empirical finance. We also report the feasible skewness-kurtosis spaces of the generalized pdf’s and compare them with empirical estimates of US stock returns. These results add to the financial estimation literature by showing the nesting relationships within the flexible pdf’s and the corresponding restrictions on higher order moments. It also demonstrates the performance of the generalized pdf’s in fitting non-normal datasets. It is important to note that the feasible skewness-kurtosis space corresponding to a generalized pdf does not accommodate all possible skewness-kurtosis combinations.
2. Literature Review
Mandlebrot  and Fama  initially found that stock returns regression residuals are skewed and are fat-tail distributed. McDonald and Nelson  found that many stock risk premiums are positive or negatively skewed and most have thick tails. Harvey and Siddique  , Harvey and Siddique  , and Chan and Lakonishok  concluded that stock returns are skewed and fat-tail distributed and applied various robust methods to address the estimation inefficiencies.
Chan and Lakonishok  , Butler, McDonald, Nelson, White  , McDonald and Nelson  discuss specifically the inefficiency in estimating the CAPM beta with OLS.
There are many robust estimation methods, some that are outlier adjustment methods, others are based on alternative specifications of pdf’s and some that are both, such as least absolute errors (using the Laplace pdf) rather than least squares for regression estimation. This investigation focuses on those methods that use alternative pdf’s that can accommodate varying levels of skewness and kurtosis and that nest more restrictive pdf’s.
Boyer, McDonald, and Newey  bifurcate robust estimation into reweighted least squares or least median squares, and partially adaptive estimators. The robust or partially adaptive estimators considered in this paper can be viewed as quasi-maximum likelihood estimators. They maximize a likelihood function corresponding to an approximating error distribution to yield estimated regression and distributional parameters. The least squares methods address only the choice of regression parameters. Boyer, McDonald, and Newey  use simulations to compare the efficiency of generalized pdf’s and least squares methods. Using one of the generalized pdf’s, they concluded that generalized pdf’s produced more efficient estimators than outlier adjustment methods that cannot change pdf parameters when regression errors have skewness or kurtosis. Therefore, among the myriad of robust estimation methods, this paper focuses on the use of generalized pdf’s.
The generalized probability distribution families considered in this investigation can accommodate a wide range of data characteristics. These generalized probability distribution families are the generalized beta and exponential generalized beta and variants from McDonald and Xu  , the skewed generalized T from Theodossiou  , the inverse hyperbolic sine from Johnson  and the g-and-h from Tukey  and Dutta and Babel  . Some of these distributions have been used in Hansen, McDonald, and Theodossiou  to model various skewed and fat-tail distributed financial time series data in GARCH specifications. The skewed generalized T pdf is starting to be used more frequently and recently has been as added to the Stata© econometric software package for regression. The SGT was used by Hansen, McDonald, Theodossiou, and Larsen  to show the differences in regression results for a model of real estate prices with data and errors that are positively skewed and fat tail distributed. It clearly shows the improvement in variance of the estimates.
Mauler and McDonald  apply the generalized beta of the second kind, the inverse hyperbolic sine, the g-and-h, and others to generalize the Black-Scholes option pricing model to explore potential improvements relative to the original log-normal specification of the options model. All alternative flexible pdf’s considered generated improvements in the accuracy of options price estimates relative to the log-normal pdf.
Kerman and McDonald   derive feasible skewness-kurtosis spaces for variants of pdf’s within the exponential generalized beta and the skewed generalized T families. Kerman and McDonald  find that the skewed generalized T and its nested skewed generalized error pdf’s have the most flexibility of many pdf’s that they modeled. McDonald, Sorenson, and Turley  obtain expressions defining the skewness-kurtosis spaces corresponding to the generalized beta of the second kind.
Theodossiou  derives the skewed generalized error distribution nested within the skewed generalized T family and applies it to various asset pricing models estimations and derivations. Theodossiou and Savva  use robust estimation (partially adaptive estimators) based on the skewed generalized T, which accommodates negatively skewed asset returns, to address empirical inconsistencies in the finance literature on the risk-return relation. Next, the families of the generalized pdf’s are discussed.
3. Families of Generalized Probability Distributions for Financial Modeling
We present the following generalized pdf families that accommodate asymmetry and thick tails and the pdf’s that they nest where y is the random variable and the distributional parameters control the moments of the distribution. Whereas the normal has two parameters, the following distributions have 4 to 5 parameters (described in the Appendix for each distribution):
1) The generalized beta (GB), ,
2) The exponential generalized beta (EGB), ,
3) The skewed generalized T (SGT), ,
4) The inverse hyperbolic sine (IHS), and,
5) The g-and-h distribution, .
The Appendix shows the specifications of the pdf’s and the associated parameter expressions that controls their shape (skewness and kurtosis) and location. The GB, EGB, and SGT are five-parameter distributions and the IHS and g-and-h distributions each involve four parameters. All of these families nest the normal or a variant of the normal. For example, the GB nests the half-normal. Gauss’s development seems to have been the catalyst which motivated future generations of mathematicians and statisticians to start with the normal pdf and generalize it, going down different pathways, to better model the diverse distributional characteristics encountered in modelling various data sets.
Figures 1-5 show the many distributions that are nested within the five pdf’s
Where: a controls peakedness; b is a scale parameter; c domain ; p, q shape parameters.
Figure 1. Generalized beta family of density functions  .
Where: m controls location; is a scale parameters; c defines the domain; p, q are shape parameters.
Figure 2. Exponential generalized beta family of density functions  .
Where: m = mode (location parameter); ; ; p, q = shape parameters (tail thickness, moments of order < pq = df).
Figure 3. Skewed generalized T family of density functions  .
Figure 4. Inverse hyperbolic sine family of density functions.
Figure 4. Inverse hyperbolic sine family of density functions.
Figure 5. The g-and-h family of density functions.
Figure 5. The g-and-h family of density functions.
listed above. An inspection of Figures 1-5 show that the nested distributions are obtained by imposing various restrictions on the parameter values of the more flexible pdf. For example, the restrictions on the values of p and q of the GB and EGB control the skewness and kurtosis of those families of pdf’s. For example, if c = 1 in the GB distribution (Figure 1), the corresponding pdf is seen to be the generalized beta of the second kind (GB2) which is defined for positive valued random variables. For the SGT family in Figure 3, λ describes skewness (negative for negative skewness and vice versa) and p and q determine the shape of the pdf. The IHS and g-and-h pdf’s nest the normal (see Figure 4 and Figure 5) as limiting cases and have the flexibility to accommodate a wide range of skewness and kurtosis values.
The GB nests at least 26 pdf’s. Among those some that are commonly used in economics and finance are the GB2, log-normal (LN), Pareto, truncated or half- student’s T, chi-squared, exponential (EXP) and the truncated or half-normal pdf’s. Its exponential version, the EBG, nests the EGB2, which has been used in recent papers on robust estimation involving the capital asset pricing model and other financial models. The EGB and SGT nest at least ten distributions each. The SGT, among the many others, nests the Laplace, uniform, normal, student’s T, and the generalized error distribution (GED). The non-unitary version of the student’s T and the GED are offered as options to the normal pdf by EVIEWS© and Stata© econometric software for GARCH regression error pdf choices to model thick-tailed errors distributions. As this writing, no pre-written commercially available statistical software is available for estimation that we are aware of for most of these generalized pdf’s other than the SGT in a regression specification in Stata© (sgtreg).
The combination of mathematically admissible skewness-kurtosis values corresponding to the generalized pdf’s, the EGB2, SGT, SGED, IHS and g-and-h pdf’s are shown on Figure 6. The g-and-h has the least restrictive combination of the admissible moments and the EGB2 is the most restrictive with all combinations having to be on or inside the EGB2 moment space “smile.” differences in minimum levels of kurtosis.
Figures 7-12 show how the shapes of the density functions change for varying values of the parameters of each pdf. Note in Figure 12 that for the g-and-h we allowed h < 0 which corresponds to a random variable with bounded support and permits bimodal distributions. Combined with varying skewness values for g, the pdf’s have bounded support, but only for g < 0.
4. Empirical Applications
4.1. Distributional Characteristics
First, we consider the distributional characteristics of the total stock returns for the population of stocks included in the Center for Research for Security Prices (CRSP) database. Secondly, we focus on the distribution of the stock returns on two stocks, one normally distributed and the other non-normally distributed. We also look at the impact of the distribution on estimated capital asset pricing model betas. Finally, we consider the distributional impacts in an ARCH specification.
The data used is the daily, weekly, and monthly excess stock returns for all continuously traded common stocks in the CRSP database for every trading day for five years within the period January 2, 2002 to December 29, 2006 with approximately 1250 daily returns, 260 weekly returns and 60 monthly returns data points for each stock. Since asset market speculative bubbles and crashes have a tendency to exacerbate skewness and thick tails of the returns distribution, this investigation chose a time frame that did not include either asset market conditions. The financial market crisis and the ensuing extreme drop in asset prices that occurred in the forthcoming years after 2006 were purposely avoided so that returns in a typical market regime are modeled. The choice of an observation period that includes bubbles would exacerbate the difference in results between robust and standard estimation methods.
Figure 6. Skewness and kurtosis ranges for the EGB2, SGT, SGED, IHS and g-and-h distributions.
Where: a controls peakedness; ϕ is a scale parameter and p q are shape parameters.
Figure 7. GB2 pdf’s evaluated for different parameter values.
Where: m controls location; is a scale parameters; p, q are shape parameters.
Figure 8. EGB2 pdf’s evaluated for different parameter values.
Where: m = mode (location parameter);
; p, q = shape parameters (tail thickness, moments of order
Figure 9. SGT pdf’s evaluated for different parameter values.
Figure 10. IHS pdf’s evaluated for different parameter values.
Figure 11. g-and-h pdf’s evaluated for different parameter values with h > 0.
Figure 12. g-and-h pdf’s evaluated for different parameter values with h < 0.
The excess return is calculated by CRSP as the total holding period rate of return minus the total holding period rate of return on the one-month US Treasury (the Fama-French risk free rate of return). This provided data for 4547 stocks traded on the NYSE, NASDAQ, and AMEX exchanges for the time frame. The skewness and kurtosis values were calculated for the daily, weekly and monthly returns for each stock for the time frame.
Figure 13 and Figure 14 show the plots of the estimates of the skewness and kurtosis contrasted with the admissible parameter spaces for each pdf. It shows how high to lower frequency returns affect the skewness and kurtosis of stock returns. ARCH processes in returns are more pronounced for higher frequency data therefore we should expect to see more leptokurtosis relative to skewness from ARCH effects as intermittent high and low volatility in returns clusters drive the persistence of the volatility of returns while the randomness of the algebraic signs of the spikes (+ or −) dampens skewness in either direction.
A comparison of the admissible moment spaces with the estimated moments shows that much of the data does fall within skewness-kurtosis feasible spaces of the pdf’s. Figure 15 shows the proportion of the estimates that fall within the parameter spaces for the selected generalized pdf’s. It shows that the g-and-h admissible space includes nearly all of the estimates with 100%, 99.98% and 98.99% of the daily, weekly and monthly estimates, respectively, falling within the space. The EGB2 has the least fit of the estimates with 15.48%, 43.81% and 50.80% fitting within its admissible space. The SGT and IHS spaces generally fit the estimates well with roughly 80% to 90% of the estimates within their spaces. This finding is consistent with Kerman and McDonald (2013) that the SGT has the most flexibility of the EGB2, SGT, and IHS families. The “bound” in these figures corresponds to Klassen  bound for unimodal distributions.
Figure 13. Daily, weekly and monthly excess returns moments and admissible skewness ? kurtosis parameter spaces.
Figure 14. Monthly excess returns moments and admissible skewness ? kurtosis parameter spaces.
Figure 15. Fraction of stock returns in admissible parameter space.
4.2. Two Examples
We now contrast the pdf’s of two stocks with normally and non-normally distributed returns. Figure 16 and Figure 17 show the empirical pdf’s of the total returns for US Steel and iShares as examples. US Steel was chosen because its returns are approximately normally distributed with almost no skewness or excess kurtosis as reflected in the statistically insignificant value of the Jarque-Bera (JB) statistic, which is asymptotically distributed as a chi-square with two degrees of freedom. iShares was chosen because it has severe skewness (−29.1), kurtosis (965.1), and a statistically significant JB statistic equal to 48,733,899. Note that the plotted pdf’s, log-likelihood values, sum of squared errors (SSE) and sum of absolute errors (SAE) as indicators of goodness-of-fit for all pdf’s for US Steel returns are very similar in value. This is in sharp contrast to the results for iShares. The log-likelihood value of the flexible pdf’s are orders of magnitude higher than that for the normal. The fitted pdf’s, SSE’s and SAE’s all indicate that the flexible pdf’s provide a much better fit than does the normal. These two examples demonstrate how much better the pdf’s that accommodate skewness and kurtosis can approximate the distribution of the returns relative to the normal.
4.3. Capital Asset Pricing Model Betas
We have also performed capital asset pricing model (CAPM) regressions for two stocks, one with approximately normally distributed regression errors and the other that is skewed and has thick-tails. This is the same approach used by McDonald, Michelfelder, and Theodossiou (2010) to compare beta (slope) estimates for public utility stocks with normal and skewed and thick-tail distributed regression errors. Figure 18 shows the skewness, kurtosis, JB Statistics, and beta estimates for two stocks, one with normally distributed regression errors and the other non-normally distributed. United Natural Foods has normally distributed CAPM regression errors as indicated by the values of skewness, kurtosis and the JB statistic. The OLS beta for United Natural Foods is 0.313 and the range from the other regression error pdf’s range from 0.302 to 0.335.
Figure 16. PDF fits for a stock with normally distributed daily excess returns: US steel.
Figure 17. PDF fits for a stock with leptokurtic and skewed daily excess returns: iShares.
Figure 18. Capital asset pricing model beta estimates for stock examples with normal and non-normally distributed returns regression error terms.
The 99 Cent Only stock returns distribution is non-normally distributed as indicated by the skewness and kurtosis values and JB statistic. The beta estimated with OLS for the 99 Cent Only stock is subject to more prediction error compared with United Natural Foods as the OLS estimate is 0.184 and the range for the flexible pdf’s are from 0.106 to 0.125.
4.4. ARCH Specifications
Lastly, we consider the impact of distributional assumptions in an ARCH specification. Figure 19 shows the root mean square errors of the estimated beta from 10,000 replications of 60-month simulations for the three data generating processes (DGP). The data generating process is
where months between to . For the normal-no ARCH , for normal-
ARCH, with and for the T-ARCH, with .
Not surprisingly, as the shaded highlights show for each of three data generating processes, the correct specification yields the most efficient estimates. For example, consider the normal-no-ARCH DGP. Over-specifying the model (using a more flexible pdf than necessary) increases the variance of the estimates (reduces efficiency). However, in many cases the efficiency loss is modest. This is also true for the ARCH estimations for this data generating process, with additional efficiency losses associated with the inclusion of unnecessary ARCH parameters. The normal-ARCH DGP results also show that over-specifying the model increases the root mean square error whereas correctly including the ARCH component improves estimator efficiency. Neglecting to account for the ARCH component has a significant impact whereas specifying a more flexible pdf has a modest impact on the RMSE in most cases.
Regarding the T-ARCH DGP, again, as expected, the correct specification yields the most efficient estimates. Again, failing to account for the ARCH component has a greater negative impact on efficiency than does over parameterizing the underlying distribution.
Therefore, correctly specifying the data generating process yields the most efficient estimator as measured by RMSE. Over-specifying the error distribution, including the inclusion of an unnecessary ARCH component reduces efficiency, but in many cases the impact is small. Similarly, failure to include an appropriate
Figure 19. Root mean square errors based on simulations of the prediction of excess stock returns.
ARCH component reduces efficiency. Log-likelihood ratios or Wald test statistics can help detect over-specification of an error data generating process.
Robust or partially adaptive estimation is an approach to estimating parameters which are relatively insensitive to mis-specifying the underlying distributional assumptions of the model. We have shown several families of general or flexible distributions that can reduce the impact of model misspecification. It is also important to understand that the more general distributions, while accommodating possible skewness and thick tails, cannot accommodate all possible combinations of skewness and kurtosis parameter values. The wrong choice of an error distribution can reduce efficiency as well as introduce bias to the estimates. This paper shows the family trees, nesting relations, parameter space restrictions and a few asset returns applications of the major flexible pdf’s used in robust estimation in the literature. A researcher must choose very carefully the appropriate distribution. The choice of a more general pdf has an increased likelihood of including a correct specification.
We are grateful to Brad Larsen for his excellent research assistance. Brad Larsen is currently an assistant professor of economics at Stanford University. We also thank participants at various Multinational Financial Society Annual Conferences where some of this material was first presented.
Appendix: Specifications of the General Probability Distributions and Their Parameters
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.
A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service
User-friendly online submission system
Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles
Maximum dissemination of your research work
Submit your manuscript at: http://papersubmission.scirp.org/
Or contact email@example.com
 McDonald, J.B., Michelfelder, R.A. and Theodossiou, P. (2009) Robust Regression Estimation Methods and Intercept Bias: A Capital Asset Pricing Model Application. Multinational Finance Journal, 13, 293-321.
 McDonald, J.B. and Nelson, R.T. (1989) Alternative Beta Estimation for the Market Model using Partially Adaptive Estimation. Communications in Statistics: Theory and Methods, 18, 4039-4058.
 Butler, R.J., McDonald, J.B., Nelson, R.D. and White, S. (1990) Robust and Partially Adaptive Estimation of Regression Models. Review of Economics and Statistics, 72, 321-327.
 Boyer, B.H., McDonald, J.B. and Newey, W.K. (2003) A Comparison of Partially Adaptive and Reweighted Least Squares Estimation. Econometric Reviews, 22, 115-134.
 Dutta, K.K. and Babel, D.F. (2005) Extracting Probabilistic Information from the Prices of Interest Rate Options: Tests of Distributional Assumptions. Journal of Business, 78, 841-870.
 Hansen, J.V., McDonald, J.B., Theodossiou, P. and Larsen, B.J. (2010) Partially Adaptive Econometric Methods for Regression and Classification. Computational Economics, 36, 153-169.
 Kerman, S.C. and McDonald, J.B. (2015) Skewness-Kurtosis Bounds of EGB1, EGB2 and Special Cases. Communications in Statistics: Theory and Methods, 44, 3857-3864.
 Kerman, S.C. and McDonald, J.B. (2013) Skewness-Kurtosis Bounds for the Skewed Generalized T and Related Distributions. Statistics and Probability Letters, 83, 2129-2134.
 McDonald, J.B., Sorenson, J. and Turley, P. (2013) Skewness and Kurtosis Properties of Income Distribution Models. Review of Income and Wealth, 59, 360-374.