Due to chaos in financial market, traditional financial engineering, which is typically based on econometrics method with linear assumption and casualty, usually does not work out. Most of regression models, neglect rich cross section information of financial time series, and try to predict specific price, which is nearly impossible. Models with low loss data reduction on rich cross section information become a challenge for researchers.
This paper first examines the hypothesis whether the distribution of volume and price is subject to a certain probability distribution, and then the discreteness of distribution transition over time. Based on the above hypotheses, it is possible to predict the distribution of future prices, and form investment strategies. Therefore, the research is based on a large number of micro-data to study the distribution of the micro-structure of volume-price distributions, and the approximate simplification of normal distribution. On the basis of scale-free microstructure, a state transition model is proposed to describe the evolution of dynamic equilibrium of financial market. The micro-structure of financial market and the dynamic transition between each state are explored, and the quantitative suggestions and guidance for the investment behavior of financial market are put forward.
2. Literature Review
Clark  believes that the volume of transactions is positively related to the volatility of prices, and proposes a preliminary model of the mixed distribution hypothesis (MDH), assuming that information will cause simultaneous changes in price volatility and volume. Based on this, Epps and Epps  examined the mechanism of intraday trading through theoretical and empirical models, and considered that market heterogeneity, absolute price change, and volume have a certain positive correlation, and the change in price logarithm obeys the transaction. The quantity is a mixed distribution of mixed variables, and the mixed distribution has a higher kurtosis. Tauchea and Pitts  extended the hybrid distribution hypothesis theory to derive joint probability distributions and volume of price changes for any time interval of intraday trading from economic theory, and based on how more and more traders enter (or exit). The market is to determine the changes in the joint distribution. Harris (1982)    pointed out that the mixed distribution hypothesis can explain the “thick tail” characteristics of daily price changes, as well as the positive correlation between earnings volatility and trading volume. These studies assume the probability distribution of price changes, but do not provide a model for the evolution of probability distribution over time.
The hypothesis of the standard mixed distribution hypothesis is that the emergence of information flow drives the positive correlation between price fluctuations and volume. The model of Lamoureux and Lastrapes  assumes that the arrival rate of information is sequence-related. Andersen  proposes an improvement over the assumption of the bi-normal distribution of the original standard model, which incorporates conditional Poisson distributions of the transaction process and information-information components that are insensitive to information. These mixed distribution hypothesis models are based on the assumption of complete liquidity, ignoring the impact of liquidity friction on fluctuations and volume. On the other hand, some studies      have shown that the impact of liquidity will affect both the rate of return and volume. Hedge funds, for example, track illiquid pressure on prices and enter the market to provide liquidity for arbitrage  . Serge Darolles, Gaëlle Le Fol, and Gulten Mero  included information flow and liquidity shocks in the relationship between volatility and volume. Some studies of developing country markets also support mixed distribution hypotheses    . Therefore, the impact of the information flow on the probability and distribution of the quantity and price as well as the impact results requires further analysis.
Wu Chongfeng and Wu Wenfeng  integrated the time dimension, transaction price, and trading volume, and used a simple segmentation function to reconstruct the stock price series based on volume using the dimensional transformation. The GARCH model was used to demonstrate the feasibility. Zhang Yongyi et al.  used a regression model to study the dynamic relationship between historical volume price information and price information. The empirical results showed that trading volume information is more conducive to price discovery than historical price information. Li Mengxuan and Zhou Yi  used the Granger causality test to study the relationship between high-frequency trading stock market prices and trading volume. Chen Qiling and Song Fengming  used econometric models to examine the correlation between price changes and trading volumes in the Chinese stock market. The empirical results all point out that the absolute value of daily trading price volatility in the Chinese stock market is directly proportional to the trading volume, and this correlation is not symmetrical. Li Shuangcheng  used the asymmetric GARCH-M model to find that the hybrid distribution hypothesis was also established in Shanghai and Shenzhen cities in China. The Shanghai stock market responded more quickly to the transaction volume as an alternative indicator of information flow.
All in all, the above study found the probability distribution of price changes, but it did not consider the volume. On the other hand, the evolution of the probability distribution still lacks applicable models.
3. State Transition Model of Volume-Price Distribution
Definition 1. Volume-Price Distribution. The stock market is an ever-changing time series, choosing a time series as a segment, the prices of all transactions in the segment and their corresponding total transactions constitute a distribution function that describes the functional relationship between the price and the volume in the transaction time series within segment of the stock market.
Definition 2. State of Volume-Price Distribution. The parameters (mean and variance, for instance) to describe the Volume-Price Distribution, is considered as a state. Different time series segments of different clusters belong to different states.
Definition 3. State Transition of Volume-Price Distribution. As time goes by, the state of Volume-Price Distribution in a time segment will continue to change with a certain probability, discretely. The change of this state is defined as state transition, which can be described as State Transition Probability Table.
The idea is, first to propose an algorithm to fit the normal distribution to Volume-Price distribution. Secondly, we use the exhaustive method and the genetic algorithm to optimize the search. We search for the approximate micro-structure of the normal distribution in the data training set, and calculate the corresponding parameters; then cluster the above microscopic structure, use the endogenous and exogenous method to calculate the optimal cluster structure, and establish the state transition table between states based on the original state transition data. Finally, a state transition model based on dynamic equilibrium is established according to the parameters, and the correctness of the state transition model is verified with the verification set data.
Assumption. Volume-price distribution transits between different states with a certain probability. This assumes the distribution has discrete states, which will be verified by clustering techniques in the following sections.
State Clustering Algorithm. Using k-mean clustering method, 9000 state time intervals are divided into K clusters, among which the number of cluster K is determined by combining the endogenous method and the exogenous method. Based on previous empirical experience, we select sqrt (n/2) as the number of clusters k, then the expected number of elements per cluster is sqrt (2 × n).
State Transition Probability Algorithm. The algorithm sorts the segmented intervals of the approximate normal distribution pattern by time, and generates the transition probability before different states. The state transition matrix is established based on the time series statistics of the state transitions: Calculate the total number of possible states after each state transition in the time series, and divide the total number of occurrences of each state by the total number as the transition probability between the two states, and establish a state transition matrix.
Construction of State Transition Models
The Construction of short-, medium-, and long-term state transition models are shown in Table 1.
Short-term state transition model: To establish a short-term state transition model, we acquire approximately normal distribution within time span no more than 30 days, from the 1st to the 695th microscopic structures. The KMEANS clustering method is used to analyze the approximate normal distribution structure morphology (K = 19), and the state transition table calculated (Table 2). Based on the short-term state transition table in Table 2, predict four successive normal distributions. The expected value of prediction is 0.82, and the probability of correct prediction is 0.20.
Mid-term state transition model: likewise, time span is greater than 30 days and less than or equal to 90 days, data from 1st to 3700th microscopic structures. The KMEANS coefficient K = 43 (Table 3). The predicted expected value is 1.40, and the predicted correct probability 0.35.
Long-term state transition model: likewise, time span is greater than 90 days, data from 1st to 3500th microscopic structures. The KMEANS coefficient
Table 1. Construction of short-, medium-, and long-term state transition models.
Source: composed by authors.
Table 2. Short-term state transition table.
K = 42 (Table 4). The expected value is 2 and the expected probability of correct prediction 0.5.
Apparently, the short-term, medium-term, and long-term state transition models have large differences in the expectations of future transition. The reasons may be as follows: 1) the longer the time span, the larger the number of approximate normal distribution morphological structures included in the original transactions, making the model more accurate and subject to a small number of parts, the less the influence of outliers; 2) the longer the time span, the closer to
Table 3. Mid-term state transition table.
(Note: table shows partially due to large amount data).
the actual value the parameters describing the morphological structure of the approximate normal distribution are, the higher the accuracy of the model is.
4. Data and Experiments
This experiment takes the distribution relationship between volume and price of stock market trading as the research object, and deals with the transaction price of stock market trading and the distribution relationship of corresponding trading volume. Consider using Chinese A-share stocks that are representative, sufficiently concerned, and highly liquid to serve as research targets. Therefore, the daily transaction data of the China Shanghai Stock Exchange Vanke A
Table 4. Long-term state transition table.
(Note: table shows partially due to large amount data).
(SH000002) stock from June 1, 2007 to April 29, 2010 was selected as the basic research data.
The selected data includes the stock code, the trading day, the transaction price, and the total transaction volume corresponding to each transaction price. Transaction prices are standardized to scaleless prices, to facilitate comparison between different segments; Convert total turnover into frequency according to the ratio; and convert the overall volume and price distribution to continuous probability density curve using nuclear density smoothing. The final probability density curve is fitted to the normal distribution, and parameters (mean and variance) obtained are described as the state of the segment.
This experiment investigates possible state transitions between time interval segments with a “normal peak-thick tail” approximate normal distribution pattern. By clustering approximate normal distribution structures and dividing them into different states. The frequency and probability of each state transition is calculated to construct a state transition probability matrix, which make possible the future transition.
The idea is to establish a model on the short-term (with time span no more than one month), medium-term (with time span between 1 and 3 months), and long-term (with time span greater than 3 months). Experimental steps are as follows:
1) Cluster approximate normal distribution structure in each time interval. The K-MEANS method is used to cluster the mean and variance of the approximate morphology of normal distribution structure, 2 - 100 cluster clusters were calculated.
2) Select better clustering result (i.e., the corresponding number of clusters). Using three intrinsic methods (empirical method, elbow method, and cross-validation method), the cluster number set with better clustering effect is obtained.
3) Use external methods to determine the optimal clustering result (that is, the corresponding number of clusters). The correct predicted expected number of preferred cluster number sets is calculated, so that the highest expected number of correctly predicted clusters becomes the best.
4) According to clustering results, construct the state transition matrix. Cluster the original training set and the test set, and calculate the probability of transitions between states, then form a state transition matrix to predict the future transitions.
5) Predict the possibility of a state transition of the verification set and calculate the transition prediction result. The established state transition model is applied to verify the state transition of the set, and the probabilities are calculated.
4.1. Empirical Validation
Based on the state transition table, a state transition model is established, which can be used to determine the parameters describing the approximate normal structure of the future transaction volume price distribution―mean and variance. The mean value indicates the center of gravity of the future transaction price, that is, the price range of future larger transactions. The variance represents the concentration of future trading volume. The greater the variance, the lower the concentration, and the more the trading volume tends to be in the price region. The smaller the variance, the higher the concentration, whereas the trading volume tends to be concentrated in the area near the average value.
Cross-Validation Data Selection. A short-term state transition model was constructed with the approximate normal distribution micro-structure in time span within 30 days (Table 5). The mid-term and long term validation are omitted due to limit space of the paper.
The 1st to 694th microscopic structures are chosen as training set to establish the state transition model, and the data between the 695th and 727th, total of 33 microstructures, are used as the validation set (Table 6).
4.2. Experiment Steps and Results
1) A short-term state transition model is established based on the training set, and a probability generator is constructed. The next state of the transition is predicted according to the input state;
2) According to each state in the model, the normal distribution description parameter range corresponding to each state is calculated (Table 7), and parameter of the next state is randomly generated according;
3) Repeat step 1) and 2) for 200 times, then average the obtained parameter group as the parameter to predict the next state;
4) Quantify and compare the predicted normal distribution description with actual normal distribution description (Table 8).
In this paper, the discrete density distribution is continuously used by using the kernel density method. It is proposed and verified that there are a large number of scale-free approximate normal distribution morphological structures in the stock trading volume and price distribution. The state transition model constructed can be applied to the volume and price distribution of stock trading is forecasted.
The main contribution of the paper is to cluster the nonstandard metric valence distribution structures according to different time spans of the approximate normal distribution morphological structure, establish a state transition model, determine the approximate normal distribution morphological structure of the range of parameters (mean and variance), and indicate the state Different accuracy and limitations of the transition model.
The limitations of this paper are as follows: 1) The state transition model designed and constructed in this paper centralizes the transaction price, has demonstrated its scale-free characteristics, and is conducive to the comparison and analysis of the approximate normal distribution morphological structure. However, the model established in this paper can only determine the deviation between the center of gravity and the center value, and cannot give a specific point
Table 5. Short-term state transition model data.
Table 6. Validation sets.
for the center of gravity of the future transaction; 2) The state transition model designed and constructed in this paper focuses only on the relative shape of the
Table 7. Characteristics of validation sets.
Table 8. Results of transition prediction.
price. It is helpful to compare and analyze the morphological structure of the approximate normal distribution, and neglect the comparison of the absolute value of the volume of each different approximate normal distribution morphological structure, and cannot predict the future specific volume; 3) Financial time series. The segmentation problem also affects the prediction of specific time points and time spans of the future state.
This research is supported by Guangdong Province Applied Science and Technology Research and Development Special Fund Project (2016B010124008), Guangdong Province Science and Technology Development Special Fund (2016B040401003), Guangdong Province Science and Technology Development Special Fund (2016A010101016).
 Epps, T.W. and Epps, M.L. (1976) The Stochastic Dependence of Security Price Changes and Transaction Volumes: Implications for the Mixture of Distributions Hypothesis. Econometrica, 44, 305-321.
 Harris, L. (1982) A Theoretical and Empirical Analysis of the Distribution of Speculative Prices and of the Relation between Absolute Price Change and Volume. Unpublished Ph.D. Dissertation, University of Chicago, Chicago.
 Andersen, T.G. (1996) Return Volatility and Trading Volume: An Information Flow Interpretation of Stochastic Volatility. Journal of Finance, 51, 169-204.
 Chong, S.P., Lee, S.Y. and Nam, K. (2000) Volatility and Information Flows in Emerging Equity Market: A Case of the Korean Stock Exchange. International Review of Financial Analysis, 9, 405-420.
 Bohl, M.T. and Henke, H. (2003) Trading Volume and Stock Market Volatility: The Polish Case. International Review of Financial Analysis, 12, 513-525.
 Zhang, Y., Wang, C. and Hua, C. (2013) Is Historical Price Information More Effective in Price Discovery?—Data Analysis Based on China’s Stock Market. Chinese Journal of Management Science, No. s1, 346-354. (In Chinese)
 Li, S., Xing, Z. and Ren, W. (2006) An Empirical Study on the Relationship between Volume and Price in Shanghai, Shenzhen and Shenzhen Stock Market Based on the MDH Hypothesis. Systems Engineering, 24, 77-82. (In Chinese)