Statistical Analysis of Fuzzy Linear Regression Model Based on Centroid Method

Show more

Received 9 September 2015; accepted 15 April 2016; published 18 April 2016

1. Introduction

Regression analysis is an important and comprehensive approach to analyze relationship between dependent variable and other one or more independent variables; it has a very wide range of applications in engineering sciences, social sciences, economic and financial fields. Traditional regression analysis methods often require that both independent variable and dependent variable are clear data. However, practical problem is often not clear data but fuzzy data. For example, the amount of observations described in language, such as something large, something heavy, or approximately equal to 3, etc. Because of some ambiguous indicators, analyzing these issues only by traditional regression can not get satisfactory and completely results.

By means of Zadeh’s [1] fuzzy set theory, researchers established a different fuzzy regression model and solved its solutions. After Tanaka et al. [2] , Diamond [3] estimated regression coefficients using least squares method (FLS), which is similar to traditional LS estimate. Savic and Pedrycz [4] established a two-step model of fuzzy regression analysis by combining FLS with linear programming. Recently, Chang [5] compared fuzzy regression methods, summed up three methods of fuzzy regression: minimum fuzzy criteria, least squares fitting criteria and interval regression analysis method. The main difference between fuzzy regression and conventional regression is that the residual in fuzzy regression is a fuzzy variable, but a random variable in traditional regression. The fuzzy regression model discussed here can be divided into several cases, such as the regression coefficient being expressed in fuzzy numbers; or part of variables being ambiguous; or input and output variables being ambiguous [6] - [10] .

This paper starts from the fuzzy input and output variables, transforms them into clear data by centroid method [11] , and then, the problem of fuzzy linear regression analysis (regression coefficient is clear number) can be transformed into traditional linear regression analysis. Thus, the problems of fuzzy linear regression analysis can be addressed by the estimation and statistical diagnosis method of traditional linear regression model.

2. Fuzzy Regression Model and Parameter Estimation

Assume that is a observational data set of fuzzy input and fuzzy output, and (is the fuzzy number set of the real number set R). Then fuzzy linear regression model can be expressed as

(1)

where, is the i-th observation error.

Assume

Then the fuzzy linear regression model (1) can be expressed in the form of matrix as

(2)

where the membership functions of are.

For convenience of discussion, we assume that all observations are triangular fuzzy numbers

where the are all real numbers. And their membership functions are

In order to obtain the estimator of regression coefficient, a natural idea is to try to turn fuzzy observation data into crisp data, and then use traditional least squares method to calculate the estimator of [12] . There are many ways to transform fuzzy data into crisp data, one of the most common methods is the centroid method [7] .

The fuzzy data is transformed into crisp data (are usually called the centroid of) with the formula

(3)

Obviously, when the observation data is a symmetric triangular fuzzy data, the centroid of fuzzy observation data is the symmetric center of the symmetric triangular fuzzy data.

Lemma 1 [13] The traditional linear regression model

where

.

Then the least squares estimator of is

.

According to Lemma 1, it is easy to get the estimator for the parameter in model (1) or (2).

Theorem 2.1 Assume that the fuzzy linear regression model is

where are observed triangular fuzzy data and . Then the estimator for is

(4)

where can be calculated by (3).

When fuzzy data is reduced to crisp data, the least squares estimation of is a conventional least squares estimation.

Specifically, when the fuzzy linear regression model is, we have

Corrary Let is a set of triangular fuzzy data of the fuzzy linear regression model, it has

(5)

where

If the fuzzy observation data is not triangular fuzzy data, the centroid method can also apply.

3. The Evaluation Performance of Fuzzy Linear Regression Model

In order to evaluate the performance of fuzzy regression model, Kim and Bishu [11] introduced an absolute difference of the observed fuzzy dependent variable and estimated one as

(6)

where and are the support of and, respectively.

Essentially, is estimated error term, the smaller value of, the better fit of fuzzy linear regression model. Nasrabadi and Nasrabadi [12] showed the general calculation steps. Kao and Chyu [13] showed with the fuzzy linear regression model, the formula of is

When putting in (5) into the above formula, it has

(7)

where. The value of can be determined by using the following method

4. Parameter Estimation and Impact Analysis of the Data Deleted Fuzzy Linear Regression Model

4.1. The Fuzzy Linear Regression Model Based on Data Deletion

For the fuzzy linear regression model (1) or (2), in order to evaluate the role and impact of the i-th data point in the regression analysis, we can compare the inference results of before and after deleting the i-th data point. And we can test this point whether it is an outlier point or not. The fuzzy linear regression model with the i-th point deleted is called a case deletion fuzzy linear regression model (FCDM), and its component form and matrix form are respectively

(8)

(9)

where is the vector or matrix after deleting the i-th data of respectively and denotes the least squares estimator of in model (7).

4.2. The Parameter Estimate of Case Deletion Fuzzy Linear Regression Model

According to Lemma 1 and Theorem 2.1, we can obtain the least squares estimator.

Theorem 4.1 For the case deletion fuzzy linear regression model (6) or (7), the least squares estimator of is

and

(10)

where, and is the vector which is composed of the i-th row’s element of matrix, is

the main diagonal element of matrix. The proof of formula (10) can be obtained in Wei et

al. [13] .

Theorem 4.1 gives a calculation formula of the regression coefficient after the i-th data point deleted and also shows the relationship of the regression coefficient before and after the i-th data point deleted. It is the basis that we evaluate whether this point is a outlier point or a strong impact point. If the i-th data point is a normal point, and should be little difference. If they have a large difference, it shows that the existence of the i-th data point seriously affect the estimation of, and this data point may be a outlier point or a strong impact point.

4.3. The Impact Analysis of the Case Deletion Fuzzy Linear Regression Model

At present, the existing method on fuzzy regression model did not consider the actual data which often contains outlier point or strong impact point. However, because of the gross error, rounding error, and other factor’s interference, it’s difficult to avoid that actual data mixed with a certain proportion of outlier points or strong impact points. Once mixed with outliers, these methods will face serious challenges, and even lead to wrong conclusions. Research about the impact of the data on the model is an important part of statistical diagnosis, and one of the most straightforward way is to delete data [6] . As we transform fuzzy data into clear data, the problem of the fuzzy linear regression analysis is transformed into a traditional linear regression analysis problem. Therefore, the discussion of impact analysis based on data deleted fuzzy regression model can be transformed into a traditional data-deleted linear regression model.

Although we can get by formula (8), it’s a vector which is difficult to compare. In practice, Cook’s distance is often used to measure. Cook’s distance is one of the most important diagnostic statistics, and was originally proposed based on the statistical significance of parameter confidence region by Cook in 1977 [14] .

Cook’s distance is defined as

here

A big shows the estimate is far away from the true parameter, and and have large difference.

The following theorem is a simple formula for calculating Cook’s distance.

Theorem 4.2 [13] For the fuzzy linear regression model (2) or (7), Cook’s distance can be expressed as

where. is the fitted values with before and after deleting the i-th data point.

During the specific data analysis, we first calculate Cook’s distance point by point written as, and then find one or more particularly large through a list or figure (maybe not particularly large). The data point, with a big effect on parameter estimate, may be the outlier or the strong impact point.

5. Analysis of Practical Example

The following study shows an application of the centroid method and compares the proposed method in this paper with Diamond and Sakawa and Yano’s method.

The data in Table 1 are triangular fuzzy numbers, and we establish the fuzzy linear regression model, discuss the model’s error term and the outlier data point.

By Theorem 2.1 (centroid), we can get the fuzzy linear regression equation

Using Diamond’s method [3] , we can obtain the fuzzy linear regression equation

Also when using Sakawa-Yano’s method [11] , we can obtain:

By formula (7), we calculate the model’s error term and it’s sum using centroid method, Diamond’s method and Sakawa-Yano’s method, and the results are listed in Table 1. From Table 1, we can find that the sum of the model’s error term using centroid method is less than using Diamond’s method and Sakawa-Yano’s method. Thus, the result of fuzzy linear regression model using centroid method is better than using Diamond’s method and Sakawa-Yano’s method.

Figure 1 obtained by Matlab programming.

Table 2 and Figure 1 show Cook’s distance under centroid method and their scatter plot, respectively. Because of being with the Cook’s distance. These results indicate that the data point No. 7 is an outlier or strong impact point.

Figure 1. The scatter plot of Cook’s distance under centroid method.

Table 1. Data from Sakawa and Yano [15] .

Table 2. Cook’s distance under centroid method.

6. Conclusion

By transforming fuzzy data into clear data, the fuzzy linear regression model is transformed into traditional linear regression model. We study the parameter estimation and impact analysis of the case-deletion fuzzy linear regression model. By comparing with other methods through a practical example, we can conclude that the proposed method in this paper can be used easily and have a good fitting performance.

Acknowledgements

This research is supported by National Natural Science Foundation Grant No. 11171065, the National Statistical Scientific Foundation Grant No. 2014LY059.

References

[1] Zadeh, L.A. (1975) The Concept of Linguistic Variable and Its Application to Approximate Reasoning. Information Sciences, 8, 99-244, 301-357.

[2] Thanaka, H., Uejina, S. and Asai, K. (1982) Linear Regression Analysis with Fuzzy Model. IEEE Trans Systems Man Cybernetics, 12, 903-907.

[3] Diamond, P. (1988) Fuzzy Least Squares. Information Science, 46, 141-157.

http://dx.doi.org/10.1016/0020-0255(88)90047-3

[4] Savic, D.A. and Pedrycz, W. (1988) Evaluation of Fuzzy Linear Regression Models. Fuzzy Sets and Systems, 46, 141-157.

[5] Chang, Y.-H.O. and Ayyub, B.M. (2001) Fuzzy Regression Methods—A Comparative Assessment. Fuzzy Sets and Systems, 119, 225-246.

http://dx.doi.org/10.1016/S0165-0114(99)00092-5

[6] Kim, B. and Bishu, R.R. (1998) Evaluation of Fuzzy Linear Regression Model by Comparison Membership Function. Fuzzy Set and Systems, 100, 343-352.

http://dx.doi.org/10.1016/S0165-0114(97)00100-0

[7] Nasrabadi, M.M. and Nasrabadi, E. (2004) A Mathematical-Progrmming Approach to Fuzzy Linear Regression Analysis. Applied Mathematics and Computation, 155, 873-881.

http://dx.doi.org/10.1016/j.amc.2003.07.031

[8] Kao, C. and Chyu, C.L. (2002) A Fuzzy Linear Regression Model with Better Explanatory Power. Fuzzy Sets and Systems, 126, 401-409.

http://dx.doi.org/10.1016/S0165-0114(01)00069-0

[9] Yeh, C.-T. (2011) A Formula for Fuzzy Linear Regression Analysis. 2011 IEEE International Conference on Fuzzy Systems, Taipei, 27-30 June 2011, 2845-2850.

[10] Azadeh, A., Neshat, N. and Rafiee, K. (2015) An Adaptive Neural Network-Fuzzy Linear Regression Approach for Improved Car Ownership Estimation and Forecasting in Complex and Uncertain Environments: The Case of Iran. Transportation Planning and Technology, 35, 221.

http://dx.doi.org/10.1080/03081060.2011.651887

[11] Yager, R.R. (1980) On a General Class of Fuzzy Connectives. Fuzzy Set and Systems, 4, 235-242.

http://dx.doi.org/10.1016/0165-0114(80)90013-5

[12] Chen, S.J. and Hwang, C.L. (1992) Fuzzy Multiple Attribute Decision Making. Springer, NY.

http://dx.doi.org/10.1007/978-3-642-46768-4

[13] Wei, B.C., Lin, J.G. and Xie, F.C. (2009) Diagnostic Statistics. Higher Education Press, Beijing, 19-44.

[14] Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18.

http://dx.doi.org/10.1080/00401706.1977.10489493

[15] Sakawa, M. and Yano, H. (1992) Multiobjective Fuzzy Linear Regression Analysis for Fuzzy Input-Output Data. Fuzzy Set and Systems, 47, 173-181.

http://dx.doi.org/10.1016/0165-0114(92)90175-4