Received 9 September 2015; accepted 15 April 2016; published 18 April 2016
Regression analysis is an important and comprehensive approach to analyze relationship between dependent variable and other one or more independent variables; it has a very wide range of applications in engineering sciences, social sciences, economic and financial fields. Traditional regression analysis methods often require that both independent variable and dependent variable are clear data. However, practical problem is often not clear data but fuzzy data. For example, the amount of observations described in language, such as something large, something heavy, or approximately equal to 3, etc. Because of some ambiguous indicators, analyzing these issues only by traditional regression can not get satisfactory and completely results.
By means of Zadeh’s  fuzzy set theory, researchers established a different fuzzy regression model and solved its solutions. After Tanaka et al.  , Diamond  estimated regression coefficients using least squares method (FLS), which is similar to traditional LS estimate. Savic and Pedrycz  established a two-step model of fuzzy regression analysis by combining FLS with linear programming. Recently, Chang  compared fuzzy regression methods, summed up three methods of fuzzy regression: minimum fuzzy criteria, least squares fitting criteria and interval regression analysis method. The main difference between fuzzy regression and conventional regression is that the residual in fuzzy regression is a fuzzy variable, but a random variable in traditional regression. The fuzzy regression model discussed here can be divided into several cases, such as the regression coefficient being expressed in fuzzy numbers; or part of variables being ambiguous; or input and output variables being ambiguous  -  .
This paper starts from the fuzzy input and output variables, transforms them into clear data by centroid method  , and then, the problem of fuzzy linear regression analysis (regression coefficient is clear number) can be transformed into traditional linear regression analysis. Thus, the problems of fuzzy linear regression analysis can be addressed by the estimation and statistical diagnosis method of traditional linear regression model.
2. Fuzzy Regression Model and Parameter Estimation
Assume that is a observational data set of fuzzy input and fuzzy output, and (is the fuzzy number set of the real number set R). Then fuzzy linear regression model can be expressed as
where, is the i-th observation error.
Then the fuzzy linear regression model (1) can be expressed in the form of matrix as
where the membership functions of are.
For convenience of discussion, we assume that all observations are triangular fuzzy numbers
where the are all real numbers. And their membership functions are
In order to obtain the estimator of regression coefficient, a natural idea is to try to turn fuzzy observation data into crisp data, and then use traditional least squares method to calculate the estimator of  . There are many ways to transform fuzzy data into crisp data, one of the most common methods is the centroid method  .
The fuzzy data is transformed into crisp data (are usually called the centroid of) with the formula
Obviously, when the observation data is a symmetric triangular fuzzy data, the centroid of fuzzy observation data is the symmetric center of the symmetric triangular fuzzy data.
Lemma 1  The traditional linear regression model
Then the least squares estimator of is
According to Lemma 1, it is easy to get the estimator for the parameter in model (1) or (2).
Theorem 2.1 Assume that the fuzzy linear regression model is
where are observed triangular fuzzy data and . Then the estimator for is
where can be calculated by (3).
When fuzzy data is reduced to crisp data, the least squares estimation of is a conventional least squares estimation.
Specifically, when the fuzzy linear regression model is, we have
Corrary Let is a set of triangular fuzzy data of the fuzzy linear regression model, it has
If the fuzzy observation data is not triangular fuzzy data, the centroid method can also apply.
3. The Evaluation Performance of Fuzzy Linear Regression Model
In order to evaluate the performance of fuzzy regression model, Kim and Bishu  introduced an absolute difference of the observed fuzzy dependent variable and estimated one as
where and are the support of and, respectively.
Essentially, is estimated error term, the smaller value of, the better fit of fuzzy linear regression model. Nasrabadi and Nasrabadi  showed the general calculation steps. Kao and Chyu  showed with the fuzzy linear regression model, the formula of is
When putting in (5) into the above formula, it has
where. The value of can be determined by using the following method
4. Parameter Estimation and Impact Analysis of the Data Deleted Fuzzy Linear Regression Model
4.1. The Fuzzy Linear Regression Model Based on Data Deletion
For the fuzzy linear regression model (1) or (2), in order to evaluate the role and impact of the i-th data point in the regression analysis, we can compare the inference results of before and after deleting the i-th data point. And we can test this point whether it is an outlier point or not. The fuzzy linear regression model with the i-th point deleted is called a case deletion fuzzy linear regression model (FCDM), and its component form and matrix form are respectively
where is the vector or matrix after deleting the i-th data of respectively and denotes the least squares estimator of in model (7).
4.2. The Parameter Estimate of Case Deletion Fuzzy Linear Regression Model
According to Lemma 1 and Theorem 2.1, we can obtain the least squares estimator.
Theorem 4.1 For the case deletion fuzzy linear regression model (6) or (7), the least squares estimator of is
where, and is the vector which is composed of the i-th row’s element of matrix, is
the main diagonal element of matrix. The proof of formula (10) can be obtained in Wei et
al.  .
Theorem 4.1 gives a calculation formula of the regression coefficient after the i-th data point deleted and also shows the relationship of the regression coefficient before and after the i-th data point deleted. It is the basis that we evaluate whether this point is a outlier point or a strong impact point. If the i-th data point is a normal point, and should be little difference. If they have a large difference, it shows that the existence of the i-th data point seriously affect the estimation of, and this data point may be a outlier point or a strong impact point.
4.3. The Impact Analysis of the Case Deletion Fuzzy Linear Regression Model
At present, the existing method on fuzzy regression model did not consider the actual data which often contains outlier point or strong impact point. However, because of the gross error, rounding error, and other factor’s interference, it’s difficult to avoid that actual data mixed with a certain proportion of outlier points or strong impact points. Once mixed with outliers, these methods will face serious challenges, and even lead to wrong conclusions. Research about the impact of the data on the model is an important part of statistical diagnosis, and one of the most straightforward way is to delete data  . As we transform fuzzy data into clear data, the problem of the fuzzy linear regression analysis is transformed into a traditional linear regression analysis problem. Therefore, the discussion of impact analysis based on data deleted fuzzy regression model can be transformed into a traditional data-deleted linear regression model.
Although we can get by formula (8), it’s a vector which is difficult to compare. In practice, Cook’s distance is often used to measure. Cook’s distance is one of the most important diagnostic statistics, and was originally proposed based on the statistical significance of parameter confidence region by Cook in 1977  .
Cook’s distance is defined as
A big shows the estimate is far away from the true parameter, and and have large difference.
The following theorem is a simple formula for calculating Cook’s distance.
Theorem 4.2  For the fuzzy linear regression model (2) or (7), Cook’s distance can be expressed as
where. is the fitted values with before and after deleting the i-th data point.
During the specific data analysis, we first calculate Cook’s distance point by point written as, and then find one or more particularly large through a list or figure (maybe not particularly large). The data point, with a big effect on parameter estimate, may be the outlier or the strong impact point.
5. Analysis of Practical Example
The following study shows an application of the centroid method and compares the proposed method in this paper with Diamond and Sakawa and Yano’s method.
The data in Table 1 are triangular fuzzy numbers, and we establish the fuzzy linear regression model, discuss the model’s error term and the outlier data point.
By Theorem 2.1 (centroid), we can get the fuzzy linear regression equation
Using Diamond’s method  , we can obtain the fuzzy linear regression equation
Also when using Sakawa-Yano’s method  , we can obtain:
By formula (7), we calculate the model’s error term and it’s sum using centroid method, Diamond’s method and Sakawa-Yano’s method, and the results are listed in Table 1. From Table 1, we can find that the sum of the model’s error term using centroid method is less than using Diamond’s method and Sakawa-Yano’s method. Thus, the result of fuzzy linear regression model using centroid method is better than using Diamond’s method and Sakawa-Yano’s method.
Figure 1 obtained by Matlab programming.
Table 2 and Figure 1 show Cook’s distance under centroid method and their scatter plot, respectively. Because of being with the Cook’s distance. These results indicate that the data point No. 7 is an outlier or strong impact point.
Figure 1. The scatter plot of Cook’s distance under centroid method.
Table 1. Data from Sakawa and Yano  .
Table 2. Cook’s distance under centroid method.
By transforming fuzzy data into clear data, the fuzzy linear regression model is transformed into traditional linear regression model. We study the parameter estimation and impact analysis of the case-deletion fuzzy linear regression model. By comparing with other methods through a practical example, we can conclude that the proposed method in this paper can be used easily and have a good fitting performance.
This research is supported by National Natural Science Foundation Grant No. 11171065, the National Statistical Scientific Foundation Grant No. 2014LY059.