Since the great financial crisis of 2008, many studies have pointed out that even in the portfolio where the asset allocation is sufficiently diversified; it is still possible that risk allocation is well concentrated to a few assets.
Traditionally portfolios are constructed using the men-variance approach . To calculate optimal portfolio weights, this method performs optimization using expected returns and risks. However, as these numbers are hard to estimate with precision, the calculated weights tend to be biased and often not as diversified as initially intended .
One approach to this problem is risk parity strategies which equalize the risk contribution of each asset . Also, according to , unlike mean variance, risk parity does not depend on the estimation accuracy of the covariance matrix for performance. While pension funds generally have a diversified portfolio of stocks and bonds, it is still said that their portfolios have a large bias in risk contribution.
As the risk of stocks is much larger than bonds, the majority of the portfolio risk comes from stocks. In the first place, we make diversified investments because we expect other assets to support the overall performance when one asset is in a poor condition.
When the majority of the portfolio risk is occupied by stock, other assets do not make up for the stock market slump, and the expected effects of diversified investment cannot be obtained. Risk parity strategies have been proposed as an alternative to these conditions.
The concept of risk parity applies to the asset allocation problem and there are many previous studies. For example,  demonstrates that risk parity performance is more efficient as measured by Sharpe Ratio, compared to traditional balanced portfolios with a 60:40 equity and bond investment ratio.
On the other hand, some caveats are pointed out for risk parity strategy too.  and  have pointed out that even if the risk contributions are made equal, the sources of risk are not diversified.
Therefore, in this paper, we first group assets with similar movements using non-hierarchical clustering method. Then, we propose a non-hierarchical clustering/risk parity strategy in which the risk contributions are equal both in each cluster and within the cluster. We also propose x-means++ which is a combination of x-means algorithm   and k-means++ algorithm  in order to secure the robustness of clustering.
Assuming assets with similar movement have common risk sources; our approach will construct a portfolio which equalizes risk sources. Empirical analysis using actual price data of various asset classes shows that our proposed method will outperform risk-parity strategies  or hierarchical clustering risk parity strategies .
The remaining sections of this paper are organized as follows. In Section 2, we briefly describe the related studies of the risk-based portfolio. In Section 3, we introduce the risk parity portfolio and non-hierarchical risk parity portfolio. In Section 4, we describe the x-mean++ clustering and in Section 5, we verify its effectiveness through empirical analysis with the actual financial market data. Finally, we conclude in Notation.
2. Related Work
Unlike the mean-variance portfolio, which uses both estimated return and risk, risk-based portfolios only use estimated risk to construct a portfolio. As predicting future returns is troublesome and also error maximization features of mean-variance optimization approach tend to construct a portfolio concentrated on a few securities , risk-based portfolios that do not use the expected return have attracted attention of practitioners.
Typical risk-based portfolios are the minimum variance portfolio , the risk parity portfolio , and the maximum diversification portfolio . Each of these has shown to provide better performance than market capitalized portfolios and mean variance portfolios . The minimum variance portfolio determines the asset allocation so that the variance of the portfolio is the smallest. This portfolio is located at the left end of the efficient frontier in the risk/return plane, and the expected return of this portfolio is also the smallest of the efficient frontier. However, it is known that minimum variance portfolios tend to have higher risk/return ratio ex-post. The maximum diversification portfolio is the portfolio with the largest diversification effect, and is obtained by maximizing the diversification ratio which is the weighted average of asset risk divided by portfolio risk.
Furthermore, it is known that these three portfolios can be written as a generalized risk-based portfolio . Extensions to these three portfolios have been proposed. As an extension of the minimum variance portfolio, there are 1) those that include higher-order moments , 2) those that devise the method of estimating the co-variance matrix (Gaussian Process Latent Variable Model: GPLVM  and t-Process Laten Variable Model: TPLVM  ), and those that use downside risk such as conditional value at risk (CVaR)   which is an alternative to co-variance based risk measure.
As an extension of risk parity, there are principal component risk parity  and complex principal component risk parity  that focus on the source of risk. Furthermore, there is a hierarchical cluster risk parity  that divides the risk into clusters and distributes the risk to those clusters. This study proposes a non-hierarchical risk parity portfolio.
3. Non-Hierarchial Clustering Risk Parity Portfolio
3.1. Risk Parity
We consider a portfolio of n risky asset and let be the return (random variable) vector of each assets, be the vector of expected returns, and be the covaraiance matrix of asset returns. Additionally, we denote weight vector of portfolio as .
To derive the specific form of the risk parity portfolio, we will introduce Marginal Risk Contribution (MRC) as a derivative of portfolio risk by weight w.
We will be able to decompose portfolio risk using MRC as following.
We will additionally define Risk Contribution (RC) as below.
Finally Risk Parity Portfolio can be defined as a portfolio which RCis from each asset i are equal.
Restricting short-selling and usage of leverage,  showed that portfolio weights can be calculated efficiently by solving optimization problem (5)-(6). As (5)-(6) are a convex optimization problem this will have a unique solution.
3.2. Non-Hierarchical Clustering Risk Parity
Essence of risk parity portfolio is controlling risk allocation. While constructing a risk parity portfolio we choose to allocate risk contribution equally to each asset, but we can consider alternative way of allocating risk, which is called risk budgeting strategy .
In this article we aim to equalize risk contribution from each cluster and at the same time equalize risk contribution from each asset within every clusters.
To achieve this goal, we will first perform non-hierarchical clustering using asset returns to determine risk clusters.
And using this cluster we will solve optimization problem below to get the portfolio weights of non-hierarchical clustering risk parity portfolio. k stands for number of clusters and stands for number of assets in each cluster. We can see that risk contribution from each cluster is equalized and risk contributions from each asset within each cluster are equalized in this portfolio. We will introduce this method as non-hierarchical clustering risk parity strategy.
The k-means is a standard algorithm of a hierarchical clustering method which is easy to implement and has high calculation efficiency.
A cluster refers to a collection of data points aggregated together according to certain distances and a centroid is a center point in each cluster.
We first define a target centroid number k.
The k-means divides the data into k clusters so as to minimize the following evaluation function in which is the distance function.
However, the k-means algorithm has two shortcomings. First, the result may depend on the initial clusters, so the algorithm does not guarantee the optimal clustering. Second, the algorithm needs to set the numbers of clusters k initially.
The initialization method called k-means++  was proposed for the first shortcoming. And x-means   was proposed for the second one. In this paper, we combine the k-means++ initialization with x-means. Next section describes the k-means++ and x-means algorithm which are the components of our proposed. Table 1 shows the comparing each algorithm.
The feature of k-means++ is the initialization of centroids . The k-means++ algorithm decides the k clusters as follows:
Choose one data point at random in data as an initial centroid .
For each data point , compute , the distance between and the nearest centroid that has already been chosen.
Choose one new data point at random as a new centroid with the following probability
here, the data already selected as the cluster has the probability 0 because the distance between the data and the nearest centroid is 0.
Repeat the step 2 and 3 until k centroids have been chosen.
Table 1. Comparison of clustering algorithms.
The x-means algorithm can determine the optimal number of clusters unlike k-means algorithm which the number of clusters has to be given in advance.
The process of x-means clustering is to perform k-means repeatedly from until a Bayesian information criterion (BIC) does not improve.
This study applies the following algorithm proposed by .
We prepare p-dimensional data whose sample size is n.
We apply k-means ( ) to all data. We name the divided clusters as .
We repeat the following procedure from step 4 to step 9 by setting .
For a cluster , we apply k-means ( ). We name the divided clusters as .
We assume the following p-dimensional normal distribution for the data :
Then, we calculate BIC as
where is the maximum likelihood estimate of the p-dimensional normal distribution; is p-dimensional means vector, and is dimensional covariance matrix; q is the number of the parameters dimension, and it becomes . is the number of elements in . L is the likelihood function which indicates .
We assume the p-dimentional normal distributions with their parameters for respectively. The probability density function of this 2-division model becomes
will be included in either or ; is a constant which lets equation (12) be a probability density function ( ). We approximate as follows:
where is a normalized distance between the two clusters, shown by
stands for a lower probability of normal distribution. The BIC for this model is
where is the maximum likelihood estimate of the p-dimensional normal distribution; since there are two parameters of mean and covariance for each p variable, the number of parameters dimension becomes . is the likelihood function which indicates .
If , we prefer the two-divide model, and decide to continue the division; we set . As for , we push the p-dimensional data, the cluster centers, the log likelihood and the BIC onto the stack. We return to Step 4.
If , we prefer not to divide clusters anymore, and decide to stop. We extract the stacked data, which is stored in Step 7, and we set . We return to Step 4. If the stack is empty, go to Step 9.
The 2-division procedure for is completed. We renumber the cluster identification such that it becomes unique in
The two-division procedure for initial divided clusters is completed. We renumber all cluster identifications such that they become unique.
We note the outputs of the cluster identification, the center of each cluster, the log likelihood of each cluster, and the number of elements in each cluster.
5. Empirical Results
This section describes the empirical study with real market data.
We perform empirical analysis using equity and bond futures price data. The indices we use in this study are summarized in Table 2. We use 15 equity futures and 12 bond futures from May 2005 to May 2020. The summary of statistics of each index is reported in Table 3.
5.2. Parameters Settings
We compare risk parity (RP) , hierarchical risk parity (HRP) , clustering risk parity (CRP) using k-means++ which the number of clusters is fixed, and non-hierarchical risk parity using x-means++ (XRP). We set k ofk-means++from 2 to 8. Our simulation process is given below.
Table 2. Investment assets.
aWords in parentheses denote tickers.
Table 3. Summary of statistics of investment assets.
First, we estimate covariance matrix and perform clustering methods using 250 days of asset return data. Then, we construct each portfolio every 20 business days. Our simulation period is from April, 2001 to May, 2020.
5.3. Performance Measures
For evaluating an investment strategy, we use the following measures that are widely used in financial space. Returns are annualized, risk is calculated as standard deviation of return and R/R stands for return/risk ratio. In this paper, each portfolio will have different risk levels as we utilize wide range of assets with various risk levels. We think R/R which is the efficiency of the portfolio performance, is more appropriate measure for performance evaluation for this study than return alone.
Here, denotes the portfolio return at time t, denotes average of and denotes the wealth of portfolio at time k.
Table 4 shows the result of simulation. The upper row shows the results for the entire period, the middle row shows the results for the first half, and the lower row shows the results for the second half. We compare risk parity (RP), hierarchical risk parity (HRP), clustering risk parity (CRPx) with x denoting the number of clusters, and non-hierarchical risk parity using by our proposed x-means++ (XRP).
Table 4. Performance statistics of portfolios.
In terms of R/R for all period, XRP is the most efficient among RP, HRP and all CRPs. In addition, the return level of XRP is higher than all methods excluding CRP7. Also, maxDD of XRP is smaller than all methods excluding HRP. Our result shows that XRP has the best performance of all. XRP also gives the best R/R and second best maxDD in both the first half and the second half.
Our study makes the following contributions:
· We propose non-hierarchical clustering-risk parity strategy in which the risk contributions are equal both in each cluster and within the cluster.
· We also propose x-means++algorithm which combines k-means++ algorithm with x-means algorithm to ensure robustness of clustering.
· Empirical analysis shows that the portfolio equalized risk contribution from each risk sources by our proposed approach, outperforms risk parity strategies or hierarchical clustering risk parity strategies.
Our future tasks are to perform empirical analysis using larger dataset such as individual stocks to verify the robustness of our proposed strategy and to apply our method to a complex valued risk diversification strategy.
R: Return (random variable) vector
: Vector of expected returns
: Covaraiance matrix
w: Weight vector of portfolio
: Portfolio risk
MRC: Marginal Risk Contribution
RC: Risk Contribution
: Center point in cluster i
k: Number of clusters
: Number of elements in
: Distance function
: p-dimensional normal distribution for the data x and paramater
 Nakagawa, K., Imamura, M. and Yoshida, K. (2018) Risk-Based Portfolios with Large Dynamic Covariance Matrices. International Journal of Financial Studies, 6, 52.
 Pelleg, D. and Moore, A.W. (2000) X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters. Proceedings of the 17th International Conference on Machine Learning, June 2000, San Francisco, 727-734.
 Nirwan, R.S. and Bertschinger, N. (2019) Applications of Gaussian Process Latent Variable Models in Finance. In: Bi, Y., Bhatia, R. and Kapoor, S., Eds., Intelligent Systems and Applications, IntelliSys 2019, Advances in Intelligent Systems and Computing, Springer, Cham, 1209-1221.
 Nakagawa, K., Noma, S. and Abe, M. (2020) RM-CVaR: Regularized Multiple β-CVaR Portfolio. Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), Kyoto, January 2020, 4562-4568.
 Maillard, S., Roncalli, T. and Teiletche, J. (2010) The Properties of Equally Weighted Risk Contribution Portfolios. The Journal of Portfolio Management, 36, 60-70.