Since the establishment of “the Belt and Road” cooperation in 2013, China’s commitment to overseas railway investment and construction can not only drive the economic development of China and neighboring countries, but also demonstrate China’s economic strength and the development of high-speed rail technology.
The primary task of the construction of overseas high-speed railway is railway route selection. The design of railway route selection is the overall design of a railway line, which directly affects the railway transportation capacity, transportation quality and economic benefits of investment. Because the work load of constructing a new railway (especially the oversea railways) is very large, and the technology is complex and widely involved. Therefore, before planning a railway, in-depth investigation and research, survey and design work must be carried out, and an optimal solution should be selected from several comparable solutions in the end. The Belt and Road Initiative involves many countries, and each country has different conditions. Therefore, it is necessary to use a unified standard to conduct risk assessments for all countries, and there are many methods for risk assessment. Choosing the appropriate method has a key role in risk assessment .
In this paper, the principal component analysis method is finally used for evaluation. The advantages are as follows:
1) The principal component analysis can eliminate the correlation between evaluation indicators Because the principal component analysis forms the principal components that are independent of each other after transforming the original index variables, and the higher the degree of correlation between the indicators is proved, the better the principal component analysis is.
2) The principal component analysis can reduce the workload of indicator selection for other evaluation methods; it is difficult to eliminate the correlation between the evaluation indicators, so it takes a lot of effort to select the indicators. While the principal component analysis can eliminate the related influences, so it is relatively easy to select the indicators.
3) When there are more rating indicators, it is also possible to use a few comprehensive indicators instead of the original indicators for analysis while retaining most of the information. In the principal component analysis, the principal components are arranged in order of variance. When analyzing the problem, some of the principal components can be discarded, and only the principal components with larger pre and post variance are used to represent the original variables, thus reducing the computational workload.
4) In the comprehensive evaluation function, the weight of each principal component is its contribution rate, which reflects the proportion of the information of the primary component of the original data to the total amount of information, so that the determination of the weight is objective and reasonable, and it overcomes the defect of artificially determining the weight in some evaluation methods.
5) The calculation of this method is relatively standardized, which can be easily implemented on a computer, and can be done with specialized software.
2. Macro-Level Risk Indicators
From a macro perspective, the construction of overseas railways is closely related to the political, economic, and social development factors of each country. Therefore, when considering the risk assessment indicators for overseas railway line selection, the principles of data availability and authority are considered. From the World Bank and the National Bureau of Statistics of China, three general indicators are selected here, namely, political, economic, and social development-related specific factors to reflect the specific situation of each country .
In consultation with Dr. Tong Xinhao, Dr. Zeng Hailin and other experts (both professors in the railway industry from Southwest Jiaotong University) and based on the actual situation of countries along the railway, the macro-level risk assessment indicators are comprehensively selected of data from various authoritative databases on railway line selection and data on foreign project contracting and import and export in China, which are shown in Table 1.
3. Macro-Level Risk Assessment and Route Selection
3.1. Macro-Level Risk Assessment Principle  
Principal component analysis is a multivariate statistical technique that transforms a set of possible correlation variables into a set of linearly uncorrelated variables by orthogonal transformation. The converted set of variables is called the principal component. The basic idea is to reduce the dimensionality of the original variable data to obtain several principal component integrated variables that are not related to each other instead of a large number of original variables, and these integrated variables carry most of the information in the original variables . The first comprehensive variable selected is denoted as F1, and F1 has the largest Var(F1), which means that F1 contains the largest amount of information, and F1 is called the first principal component. If the first principal component is insufficient to represent the information carried by the original p variables, then the second principal component F2 is selected, and F2 is independent of F1 linear, and the mathematical language expression requires Cov(F1, F2) = 0. By analogy, the third principal component and the fourth principal component can be constructed up to the No. p principal component.
This article uses related software to perform principal component analysis. The main steps are as follows:
1) Normalizing raw data.
In order to make the indicators comparable, the first thing is eliminating the different dimensions of each indicator and standardizing the indicators to obtain standardized data. This standardized process is actually doing the following transformation on the raw data X:
2) Calculating the correlation coefficient matrix of a normalized data matrix.
3) Typing the output result can directly obtain the eigenvalue and the corresponding eigenvector.
Table 1. Macro-level risk assessment indicators for overseas railway construction.
4) Calculating the variance contribution rate and the cumulative contribution rate of each principal component.
The contribution rate of the main component Fi:
Cumulative contribution rate:
In the practical application of the principal component analysis method, the corresponding first, second, …, m-th main components to the ( ) are chosen, in which the, ( ) of the eigenvalues must be the those that cumulative contribution rate is higher than 60%.
5) Calculating the principal component coefficients and principal component scores.
Let the load matrix be A, then the coefficient of the principal component Fi is the square root of each load matrix divided by the square root of the corresponding principal component variance, and propose the coefficient of Fi is the matrix Ci, then the score of the principal component Fi is:
where ZXi represents the i-th column of the matrix ZX of data normalization.
6) Comprehensive score assessment.
The first m principal components with cumulative contribution rate of 60% are selected, and take the variance contribution rate ( ) as the weight to construct a linear combination as a comprehensive evaluation function:
From the above formula, the comprehensive score R of the evaluation can be obtained, and then the magnitude of the R value is calculated according to the value of each data of each evaluation object and these R values are comprehensively sorted, thereby obtaining a comprehensive evaluation of each object to be evaluated .
3.2. Instance Application
According to the overseas railway macro-level risk assessment indicators established in Section 2, here are 15 indicators in total, the indicator data are from the World Bank (https://www.worldbank.org/) and the China National Bureau of Statistics. In order to increase the reliability of the data, this paper selects 63 countries (All the Belt and Road project routes and participating countries) as samples, because there is no direct relationship between the risks, and the difference between the dimensions Large, it is necessary to standardize the various risk data, and use the unified standard to judge, so the original data is first standardized, and the standardized data is shown in Table 2 .
The normalized data is used in the dimension reduction factor analysis, and the data is subjected to KMO and Bartley test. If the result of KMO value is greater than 0.6 and the significance of the Bartley test is less than 0.01, principal component analysis or factor analysis can be performed. Because the amount of samples are huge, so on the basis of principal component analysis, the rotation of the factors is actually rotating the factor load matrix, which can simplify the structure of the factor load matrix, so that the square of the element of each column or row in the load matrix is polarized to 0 and 1, through the factor rotation (actually coordinate rotation), it makes each original variable have a close relationship between as few factors as possible, so the actual meaning of the factor solution is easier to explain .
Then, the factor analysis tool is used for dimensionality reduction. Based on the principal component analysis, the maximum variance method is used to perform the factor rotation, and the result Table 3 is obtained.
It can be seen from the above test results that the KMO value is greater than 0.6 and the significance is less than 0.01, so it is suitable for principal component analysis or factor analysis.
As can be seen from Table 4, the first five principal components contain nearly 66% of the information, and it can be considered that these five principal components contain most of the information of the original elements.
The load matrix after the rotation of these five principal components is shown in Table 5, the coefficient indicating the risk of each component, generally greater than 0.5 - 0.6, is attributed to the component.
The above data was processed to obtain Table 6, where the gray shading marks were the portions with coefficients greater than 0.58.
Name each principal component according to the data marked in the below table.
The fifth item (power coverage rate) and the 12th item of urbanization rate in F1 have large coefficients. These two indicators are related to the level of urban development. Therefore, F1 is called “the main component of urban modernization level”;
The 14th (political stability) and 15th (government efficiency) factors in F2 are relatively large. Both of these indicators are related to the political situation, so F2 is called “the main component of the political environment”;
The third item in F3 (the turnover of China’s foreign contracted projects) and the 13th (the establishment of cooperative relations with China) have a large coefficient. These two indicators are related to bilateral cooperation, so F3 is called “bilateral with China”. The main component of the partnership;
The coefficient of item 10 (population growth rate) in F4 is relatively large, and F4 is called “the main component of population development trend”;
Table 2. Standardized data.
The coefficient of item 7 (traffic accident rate) in F5 is relatively large, and F5 is called “main component of traffic safety index”.
In summary, the five main components of national line selection risk are shown in Table 7.
Next, each column of the load matrix is divided by the square root of the variance of the corresponding principal component, and the coefficient of each principal component is obtained, and the matrix is denoted as A; then the variance matrix is normalized, which are the Weights of each principal component, this is regarded as matrix B.
Table 3. KMO and Bartley test.
Table 4. Total variance interpretation.
Table 5. Component matrix after the rotation.
Table 6. Coefficients with coefficients greater than 0.6 in the load matrix.
Table 7. 5 main components of national line selection risk.
Let the data after standardization be matrix X, then the composite score of the sample countrie: .
The results are generally irregular, for example, South Korea is 1.6243, and there is a negative number in the score, which only indicates that the score is lower than the average. Since this score does not intuitively judge the country’s risk, it is converted into a familiar percentage system score .
Then control the score between (0,1), then suppose:
The score can be converted to a percentile, as shown in Table 8:
Table 8. Risk percentage scores and rankings of each country.
3.3. Line Selection Scheme  - 
In response to the planned plan for the passage of the Asia-Europe Railway, experts proposed several planning options, of which the two most important programs in the Middle East  (Figure 1 from China Map Network  ) are:
1) Channel plan one: China-Kyrgyzstan-Tajikistan-Afghanistan-Iran, finally arrived in Germany;
2) Channel plan two: China-Kazakhstan-Kyrgyzstan-Uzbekistan-Turkmenistan-Iran, finally arrived in Germany.
According to the risk score rankings in Table 8, it can be seen that in the plan 1, Tajikistan and Afghanistan rank at the bottom, while in the plan 2, although Uzbekistan and Kyrgyzstan are ranked lower. However, compared with the plan 1, the advantages are obvious; the total score of the country of plan 2 is much higher than that of plan 1, so the plan 2 was chosen as the Middle East connectivity scheme.
This paper studies the macro-level risk assessment and line selection of overseas railways based on Principal Component Analysis (PCA). Through the analysis of principal component characteristics, the original 15 risk indicators are reduced in dimension reduction, and five principal components that can represent the main risk factors are obtained. The scores of the load weights of the five principal
Figure 1. Eurasian railway passage schematic .
components are calculated by each country, and the scores are calculated. The risk score is specified, and then through the ranking order, the risk level of the candidate line can be quickly and clearly determined, and the corresponding line selection result is obtained.
The principal component analysis method can be determined from the size of the information sample and the system effect of the sample included in the indicator, avoiding the arbitrariness of the expert scoring and subjective judgment, and the risk rankings of each country can be visually seen through the results. All these indicate that this is a more practical method for railway risk assessment and route selection.
This article lacks a control experiment and will use other risk assessment methods in subsequent studies to compare the results with principal component analysis to further determine the accuracy of the method.
This study is supported in part by China Railway Eryuan Engineering Group Co., Ltd Scientific Research Project (NO. KYY2017069 (17-17)); Sichuan Provincial Science and Technology Support Project (NO. 2019JDRC0133); 2017-2019 Young Elite Scientist Sponsorship Program by CAST (YESS); 2018 Sichuan Provincial Ten Thousand Program Project.
The data used to support the findings of this study are available from the corresponding author upon request.
 Yang, C.W., Li, Z.H., Guo X.Y., Yu, W.Y., Jin, J. and Zhu, L. (2019) Application of BP Neural Network Model in Risk Evaluation of Railway Construction. Complexity, 2019, Article ID: 2946158.
 Jin, J., Zhu, L., Li, Z.H., Tong X.H. and Yang, C.W. (2019) Application of Variable Structure of BPNN in Risk Evaluation of Overseas Railway Construction in Target Countries. Journal of the China Railway Society, 40, 7-12.