In recent years, the house prices among all provinces, autonomous regions and municipalities across China show a clear inclining trend, but the patterns of incline vary a lot across regions. For example, the average sales price of residential commercial housing in Shandong Province increases from 2904.14 Yuan/m2 (Year 2007) to 5855 Yuan/m2 (Year 2016), with an overall growth rate of 6.45%; however, the average sales price of residential commercial housing in the Ningxia Hui Autonomous Region shows a lower rate of increment, which increases from 2722.58 Yuan/m2 (Year 2007) to 5485 Yuan/m2 (Year 2016) with an overall growth rate of 14.78%. This paper uses time series analysis, geographic information, cluster analysis, causality test and other techniques to study the relationship of house prices among the all provinces, autonomous regions and municipalities (except Hainan Province) across mainland China.
Previous researches in China have proved the practicability of using time series analysis in studying house prices. Xiuli Wu and Feng Zhang (2007), using the house price in Guangzhou as a subject, divide Guangzhou into extremely bustling regions, average bustling regions and non-bustling regions and create time series analysis models for house price prediction separately  . This research proved that time series analysis is reasonable in analyzing the trend of house prices. Li He (2014) predicted commercial residential housing price index in Beijing in 2014, using time series analysis  . Li Li and Jinwen Wu (2016) showed the application of time series model in the relationship between land prices and house prices  . Xuyi Xie (2006) used time series analysis to examine the viewpoint that house prices have a more apparent impact to land prices in the long term. Foreign scholars also use this method in analyzing house prices under foreign economies  . Willcocks (2009) uses time series analysis to study house price in the UK  .
Space analysis also gains wide recognition in house price analysis. Introducing space variables into economic models has made them excellent alternatives to space-relating variables that are easily ignored. Modern development in GIS technology has provided great convenience in studying economics across regions. Zhixiong Mei and Xia Li (2007), using the house price in Dongguan as a subject, plotted the prices of residential houses in a grid graph and superimposed major roads buffer zones on to the house price distribution graph, in order to discover and explain the spatial variance in house prices of Dongguan  . Can (1998) used GIS technology to analyze the spatial relativity between the housing market and the mortgage market  .
This paper conducts time series analysis and spatial analysis of house prices in mainland China in the past 10 years.
2. Data Sources
There are two main sources of our data. The first one is the house prices from 2007 to 2016 in the provinces. This is obtained from the National Bureau of Statistics website. According to the System of Statistics Reports on Real Estate Development (2018), the data are collected from all real estate developing and management legal entities. All surveyed entities report their data through the direct network reporting system monthly  (Table 1).
The second data source is a list of provinces adjacent to the provinces (Table 2). This is collected from the map of China. Since Table 1 does not contain data for Hong Kong, Macau, and Chinese Taipei, we will not include these three regions in the next map analysis.
3. Related Technology
3.1. Geographic Information Technology
Geographic Information System (GIS) is a special-purpose digital database in
Table 1. Average sales price of commercial housing in the province from 2007 to 2016 (yuan/square meter).
Table 2. List of adjacent provinces of mainland China.
which a common spatial coordinate system is the primary means of reference. Comprehensive GIS require a means of: Data input, from maps, aerial photos, satellites, surveys, and other sources; data storage, retrieval, and query; data transformation, analysis, and modeling, including spatial statistics; data reporting, such as maps, reports, and plans  .
This paper uses the map of China as geographical reference to analyze data in house prices of provinces in mainland China.
3.2. Time Series Analysis and Granger Causality Test
Time series analysis is a permutation combination of variable values with equal time intervals. There are two purposes for using time series analysis: understanding the underlying drivers and structures of observed data, finding suitable models, and predicting and monitoring  . This article will use the Granger causality test to analyze the changes in housing prices in mainland China in the past decade and the potential factors affecting housing prices. Granger causality is a statistical concept based on predictive causality. According to the Granger causality test, if X1 is the Granger of X2, the historical value of X1 should contain future values that help predict X2. The mathematical formula is based on a linear regression model of the stochastic process  .
Analysis tool: This article uses the lmtest package for Granger causality testing. The code is implemented in R language version 3.5.
3.3. Cluster Analysis
Cluster analysis refers to the process of grouping a collection of physical or abstract objects into multiple classes of similar objects. The goal of cluster analysis is to collect data on a similar basis to classify  .
Chen Jian (2007) briefly introduced the concept and principle of clustering analysis algorithm  . Cluster analysis is an ideal multivariate statistical technique, mainly consisting of hierarchical clustering and iterative clustering. Cluster analysis, also known as group analysis and point group analysis, is a multivariate statistical method for studying classification.
Two clustering methods, K-MEANS clustering and hierarchical clustering are used in this paper.
The K-MEANS clustering method  is a kind of iterative clustering. The K-MEANS algorithm is an algorithm that inputs the number of clusters k and a database containing n data objects, and outputs a minimum of k clusters that satisfy the minimum variance. The k-means algorithm accepts the input quantity k; then divides the n data objects into k clusters so that the obtained clusters are satisfied: the object similarity in the same cluster is higher; and the object similarity in different clusters Smaller.
Hierarchical clustering is a general term for a class of algorithms that continuously merges clusters from bottom to top, or continuously separate clusters from top to bottom to form nested clusters. This level of class is represented by a “tree”  . The Agglomerative Clustering algorithm is a hierarchical clustering algorithm. The principle of the algorithm is very simple. In the beginning, all the data points themselves are clustered, and then the two clusters closest to each other are found to be combined into one, and the above steps are repeated until the preset number of clusters is reached.
This paper uses KMeans in sklearn cluster for cluster analysis, and uses hierarchy in scipy cluster for hierarchical clustering. The code is implemented in Python 3.6. Sklearn is a commonly used python third-party module in machine learning that can be installed via pip. Scipy is a commonly used third-party python module for data analysis. It can also be installed via pip.
4. Monographic Analysis
1) Kmeans clustering of house price growth rate and ratio of surrounding provinces
On the basis of Table 1, a) Calculate the growth rate of house prices, and obtain the growth rate of house prices in each province from 2008 to 2016; b) Calculate the 9 years of neighboring provinces in each province by using the information of neighboring provinces in Table 2. The growth rate of housing prices, average; c) Calculate the ratio of the growth rate of house prices in each province to the average of the surrounding provinces, we get Table 3.
Clustering Table 3, performing kmeans clustering according to Category 5, we get the following figure.
Table 3. Comparison with surrounding provinces (average).
According to Figure 1, all provinces, autonomous regions and municipalities, except Hainan Province, in mainland China are put into four clusters in the choropleth map. The average value of the annual rate of increase of residential housing prices within the province divided by the average annual rate of increase of residential housing prices among its neighboring provinces is 0.754 among the first-cluster provinces; the average is 0.962 among the second-cluster provinces; the average is 1.187 among the third-cluster provinces; the average is 2.146 among the fourth-cluster provinces.
The rates of house price increment are relatively slow in the first-cluster provinces compared with their neighboring provinces. These provinces, for example Hebei Province, Jiangsu Province, and the Guangxi Zhuang Autonomous Region, are mostly the neighboring provinces of the fourth-cluster provinces. They are left behind in the process of urbanization in the context of China’s high-speed development of urbans. Also, net outflow of population appears in these provinces. The population emigrated from the first-cluster provinces mainly end up in more economically advanced regions, especially in the fourth-cluster provinces. This movement of population decreases the rigid demand for housing in the first-cluster provinces, while increasing the demand for housing in the fourth-cluster provinces, resulting in a speed-up in house prices growing in the fourth-cluster provinces, and a slow-down in the first-cluster provinces.
In the second-cluster provinces, the rates of house price increment stay almost the same with their neighboring provinces. These provinces are mainly located in the Northeast, the North, the Northwest, the West, and the Southwest. Little variance of economical development is shown among a second-cluster province and its neighboring provinces. And population flow is not significant among them.
Figure 1. Cluster display comparing house prices in neighboring provinces.
The rates of house price increment of the third-cluster provinces are slightly higher than their neighboring provinces, but the differences are trivial. The third cluster includes Sichuan Province, Shaanxi Province, and Hubei Province in the Middle, Liaoning Province in the Northeast, and Zhejiang Province in the East.
The fourth-cluster provinces are clearly shown on the map, which are the three cores of house price increasement: Guangdong Province, Shanghai Municipality, and Beijing Municipality. They are also the cores of economic advancement and highly urbanized regions in mainland China.
2) Hierarchical clustering of house price growth rates
On the basis of Table 1, the house price growth rate is calculated, and the house price growth rate of each province from 2008 to 2016 is obtained. The hierarchical clustering of the nine-year house price growth rate in each province, we get the following picture.
Referring to the above Figure 2, we can draw the following conclusions.
Shanghai Municipality, Beijing Municipality, Jiangsu Province, and Tianjin Municipality can be sorted into one cluster. Because all of them are highly developed in their economies, huge net migration inflow and active housing property investment make them similar in the pattern of house price increasement. In addition, when the central government starts controlling the house prices, the policies are made similar among these provinces.
Tibet, as an ethnic minority autonomous region assigned by Beijing, holds a large population of minorities. Population inflows and outflows of Tibet have rigidity. Its cultural and political differences from other provinces make it a special case.
Other provinces are similar in their pattern of house price fluctuation.
4.1. Analysis of the Highest and Lowest Price Maps in Time Series
The average selling price of residential commercial housing across all provinces in mainland China is highly polarized. The two municipalities of Beijing and Shanghai have the first and the second highest average selling price of residential
Figure 2. Hierarchical clustering result.
commercial housing since 2007. Gansu Province, Guizhou Province, Henan Province, Hunan Province, Jiangxi Province, the Inner Mongolia Autonomous Region, the Ningxia Hui Autonomous Region, Qinghai Province, the Xinjiang Uygur Autonomous Region, and Tibet Autonomous Region once have been one of the least three provinces in the average selling price of residential commercial housing between 2007 and 2016. In 2016, the average selling price of residential commercial housing in Beijing Municipality (28,489 Yuan/m2) is 6.7 times more than that in Guizhou Province (3704 Yuan/m2).
That Beijing and Shanghai become provinces with very high house prices are partially because of the specificity of municipalities. According to Constitution of the People’s Republic of China, municipalities are at the same level with provinces and autonomous regions. In the study of this paper, municipalities are directly compared with provinces and autonomous regions. Thus, the data of the average selling price of residential commercial housing in Beijing and Shanghai can only reflect the house prices within these two cities’ administrative areas, without any indication of the situation in bigger areas that are, on the other hand, indicated in the data of bigger provinces and autonomous regions. If the data studied are the average selling prices of residential commercial housing, the house price in Shenzhen, Guangdong Province, is much higher than that of Beijing and Shanghai (the average selling price of residential commercial housing in Shenzhen is 45,498 Yuan/m2 in 2016).
By coloring the provinces with very high house prices and the provinces with very low house prices on the map, it clearly shows that none of the provinces with very low house prices is costal (Figure 3). Most of the provinces with very
Figure 3. Highest and lowest price map analysis.
low house prices are located in the Northwest part of China. They are located far from the ocean, and at high attitudes, with vast areas of grasslands, plateaus, deserts, and an arid environment. Harsh natural environments and economic conditions make these provinces unsuitable for human dwelling, and decreases the qualities of residential housing inside these provinces.
Guizhou Province, Hunan Province, and Jiangxi Province are located in the south-central China, with some distance from the major coastal cities, such as Shenzhen in Guangdong Province, Fujian in Xiamen Province, and Hangzhou in Zhejiang Province. With the high-speed economic development in their neighboring provinces, these relatively under-developed inland provinces have huge net emigration into the coastal provinces. Thus, their relatively low rigid demand in Guizhou Province, Hunan Province, and Jiangxi Province, contributes to a depression of house prices in these provinces.
4.2. Pulling Analysis with Neighboring Provinces
First, calculate the growth rate of house prices in each province from 2008 to 2016; then, according to Table 2, calculate the growth rate of house prices in all provinces from 2008 to 2016 (the tie value of all neighbors). Finally, the Granger causality test is used to calculate two time series of house price growth rates in each province and its neighboring provinces.
1) Pulled by neighboring provinces
The calculation results show that there are four provinces in the delay parameter Order = 1, namely Henan, Fujian, Yunnan and Jilin. In the case of the delay parameter Order = 2, there are two provinces, Tianjin and Shanghai respectively, as shown in Table 4 below.
In the above table, the P values of Henan and Yunnan provinces are less than 0.001, which is very significant. We draw the actual growth rate curve of house prices, as shown in Figure 4.
From Figure 4, we can see that with the rise, flatness, decline and rise of neighboring provinces, the growth rate of housing prices in Henan Province also responded with changes, and the time interval was very short. Looking at Yunnan, with the decline, rise, flat, decline, rise and fall of neighboring provinces, the growth rate of housing prices in Yunnan Province also responded to changes, a difference of about 1 year.
Table 4. Granger causality test driven by neighboring provinces (only listed with significant P values).
Figure 4. Growth rate curve of house prices.
2) Pulling the surrounding provinces
The calculation results show that in the case of delay parameter Order = 1, the P value of Inner Mongolia Autonomous Region is less than 0.05, which is very significant. We draw the actual growth rate curve of house price, see the chart below (Figure 5), and the growth rate of house prices in neighboring provinces along with Inner Mongolia Autonomous Region. Change and change. In the case where the delay parameter Order = 2, no region where the P value is less than 0.05 is found.
Tianjin’s high house price is partly due to its centralized economical, political and cultural significance, and high qualities of residential housing. It is also affected by its neighboring provinces, Hebei Province and Beijing Municipality. Beijing, as the province with the highest house prices of all time in mainland China, definitely pushes up the house price in its neighboring municipality of Tianjin. Beijing, the ancient and contemporary capital of China, accommodates many historical sites, national agencies, foreign embassies, shopping malls, and cooperation headquarters in its limited urban area. A total population of 13.629 million (data at the end of 2016) in Beijing exacerbates contradictions of using space. In recent years, with the process of Beijing-Tianjin-Hebei integration, industries that previously located in Beijing are continuously moved to Hebei Province and Tianjin, causing the high house price centered at Beijing to dimple in its neighboring regions. There are two major reasons why the housing market in Tianjin is also affected by Beijing: 1) Some industries in Beijing are moved to Tianjin, and aggravate the competition for land among industries, business, and residentials. This increases the developing cost for real estate developers, which then eventually increases house price. 2) More Beijing residents are buying or investing houses in Tianjin to avoid the unaffordable house price in Beijing, while maintaining the proximity to Beijing. This then increases demand for housing in the Tianjin housing market. Because the supply of housing is lack of elasticity, this leads to an uptick in real estate market of Tianjin.
Figure 5. Price growth rate curve of Inner Mongolia and surrounding provinces.
Shanghai, as the municipality with the second highest house price, has more obvious influence to the house prices of its neighboring provinces. Unlike the Beijing-Tianjin-Hebei region, the cities inside Yangtze River Delta Economic Zone are more closely connected in terms of population liquidity, economic activities, etc. The Yangtze River Delta metropolitan area is now one of the six metropolitan areas in the world. This also means the house price in Shanghai is interdependent with the cities in its neighboring provinces, Jiangsu Province and Zhejiang Province. The growing Shanghai house price pushes local citizens in Shanghai to buy houses in the near Jiangsu Province or Zhejiang Province, instead of buying or renting houses in Shanghai, while maintain their jobs in Shanghai. The highly-advanced network of high-speed railway has already created a one-hour commuting circle around Shanghai. In addition, the stretch-out of the Shanghai metro system further connects the city with other cities nearby. The convenience of transportation and the fusion of the economies explain the huge impact to the house price of Shanghai from its neighboring provinces.
This article mainly analyzes the data of mainland China, excluding data from Taiwan, Hong Kong, Macau, and Hainan Province. The shortcomings of this paper are as follows: 1) The house price collected is only 10 years. 2) The spatial scale is not detailed enough, data collection and analysis are carried out by province, and no city data is analyzed.
Geographic location is verified to be an important factor in determining the house prices. Regions with harsh environments are typically associated with low house prices, while those along the coast are mainly high-house-price regions.
The neighboring effect of house prices is mainly because of the following reasons: firstly, industries moving out of regions with high land prices, pushing up land prices in the neighboring regions; secondly, improved transportation system increased the demand for the neighboring houses from regions with high house prices.
 Willcocks, G. (2009) UK Housing Market: Time Series Processes with Independent and Identically Distributed Residuals. The Journal of Real Estate Finance and Economics, 39, 403-414.
 Foote, K.E. and Lynch, M. (2018) Geographic Information Systems as an Integrating Technology: Context, Concepts, and Definitions. The Geographer’s Craft Project, Department of Geography, The University of Colorado at Boulder.
 MacQueen, J. (1967) Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, California Press, Berkeley, 281-297.