Supervised Fuzzy Mixture of Local Feature Models

ABSTRACT

This paper addresses an important issue in model combination, that is, model locality. Since usually a global linear model is unable to reflect nonlinearity and to characterize local features, especially in a complex sys-tem, we propose a mixture of local feature models to overcome these weaknesses. The basic idea is to split the entire input space into operating domains, and a recently developed feature-based model combination method is applied to build local models for each region. To realize this idea, three steps are required, which include clustering, local modeling and model combination, governed by a single objective function. An adaptive fuzzy parametric clustering algorithm is proposed to divide the whole input space into operating regimes, local feature models are created in each individual region by applying a recently developed fea-ture-based model combination method, and finally they are combined into a single mixture model. Corre-spondingly, a three-stage procedure is designed to optimize the complete objective function, which is actu-ally a hybrid Genetic Algorithm (GA). Our simulation results show that the adaptive fuzzy mixture of local feature models turns out to be superior to global models.

This paper addresses an important issue in model combination, that is, model locality. Since usually a global linear model is unable to reflect nonlinearity and to characterize local features, especially in a complex sys-tem, we propose a mixture of local feature models to overcome these weaknesses. The basic idea is to split the entire input space into operating domains, and a recently developed feature-based model combination method is applied to build local models for each region. To realize this idea, three steps are required, which include clustering, local modeling and model combination, governed by a single objective function. An adaptive fuzzy parametric clustering algorithm is proposed to divide the whole input space into operating regimes, local feature models are created in each individual region by applying a recently developed fea-ture-based model combination method, and finally they are combined into a single mixture model. Corre-spondingly, a three-stage procedure is designed to optimize the complete objective function, which is actu-ally a hybrid Genetic Algorithm (GA). Our simulation results show that the adaptive fuzzy mixture of local feature models turns out to be superior to global models.

KEYWORDS

Adaptive Fuzzy Mixture, Supervised Clustering, Local Feature Model, PCA, ICA, Phase Transition, Fuzzy Parametric Clustering, Real-Coded Genetic Algorithm

Adaptive Fuzzy Mixture, Supervised Clustering, Local Feature Model, PCA, ICA, Phase Transition, Fuzzy Parametric Clustering, Real-Coded Genetic Algorithm

Cite this paper

nullM. Xu and M. Golay, "Supervised Fuzzy Mixture of Local Feature Models,"*Intelligent Information Management*, Vol. 3 No. 3, 2011, pp. 87-103. doi: 10.4236/iim.2011.33011.

nullM. Xu and M. Golay, "Supervised Fuzzy Mixture of Local Feature Models,"

References

[1] M. Xu and M. Golay, “Data-guided Model Combination by Decomposition and Aggregation,” Machine Learning, Vol. 63, No. 1, 2005, pp. 43-67.

[2] D. J. Bartholomew and M. Knott, “Latent Variable Models and Factor Analysis,” London: Ar-nold; New York: Oxford University Press, 1999.

[3] I. T. Jolliffe, “Principal Component Analysis,” New York: Springer-Verlag, 1986.

[4] A. Hyv?rinen, “Fast and Robust Fixed-Point Algorithms for Independent Component Analysis,” IEEE Transactions on Neural Networks, Vol. 10, No. 3, 1999, pp. 626-634.

[5] J. Karhunen and S. Malaroiu, “Locally Lin-ear Independent Component Analysis,” International Joint Conference on Neural Networks, 1999.

[6] T. A. Johansen and B. A. Foss, “Operating Regime Based Process Modeling and Identification,” Computers and Chemical Engineering, Vol. 21, 1997, pp. 159-176. doi:10.1016/0098-1354(95)00260-X

[7] G. J. McLachlan and K. E. Basford, “Mixture Models: Inference and Application to Clustering,” New York: Marcel Dekker, 1988.

[8] R. Murray-Smith and T. A. Johansen, “Local Learning in Local Model Networks,” Proceedings of IEE International Confer-ence on Artificial Neural Networks, Cambridge, UK, 1995, pp. 40-46. doi:10.1049/cp:19950526

[9] M. I. Jordan and R.A. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, Vol. 6, 1994, pp. 181-214. doi:10.1162/neco.1994.6.2.181

[10] J. Fan, “Local Model-ling,” Encyclopidea of Statistical Science, 1995.

[11] W. S. Cleveland and S. J. Devlin, “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, 1988, pp. 596-610.

[12] R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G. E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, Vol. 3, 1991, pp. 79-87. doi:10.1162/neco.1991.3.1.79

[13] U. S?derman, J. Top and J.-E. Str?mberg, “The Conceptual Side of Mode Switching,” Proceedings of IEE International Conference on Systems, Man, and Cybernetics, Le Touquet, France, 1993, pp. 245-250.

[14] S. Harrington, R. Zhang, P. H. Poole, F. Sciortino, and H. E. Stanley, “Liquid-Liquid Phase Transition: Evidence from Simulations,” Physical Review Letters, Vol. 78, No. 12, 1997, pp. 2409-2412. doi:10.1103/PhysRevLett.78.2409

[15] K. Honda, H. Ichihashi, M. Ohue and K. Kitaguchi, “Extraction of Local Independent Components Using Fuzzy Clustering,” Proceedings of 6th In-ternational Conference on Soft Computing, 2000.

[16] L. J. Breiman, H. Friedman, R. A. Olshen, and C. J. Stone, “Classi-fication and Regression Trees,” Belmont CA: Wadsworth, 1984.

[17] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in detecting Compact Well-Separated Clusters,” Journal of Cybernetics, Vol. 3, 1973, pp. 32-57. doi:10.1080/01969727308546046

[18] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Ple-num Press, New York, 1981.

[19] N. Kambhatla and T. Leen, “Dimension Reduction by Local Principal Component Analy-sis,” Neural Computation, Vol. 9, 1997, pp. 1493-1516. doi:10.1162/neco.1997.9.7.1493

[20] J. Karhunen and S. Ma-laroiu, “Local Independent Component Analysis Using Clus-tering,” Proc. First Int. Workshop on Independent Component Analysis and Signal Separation, 1999, pp. 43-48.

[21] S. E. Geman, Bienenstock and R. Doursat, “Neural Networks and the Bias/Variance Dilemma,” Neural Computation, Vol. 4, 1992, pp. 1-58.

[22] H. Akaike, “Information Theory and an Exten-sion of the Maximum Likelihood Principle,” 2nd International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.), 1973, pp. 267-281.

[23] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, Vol. 6, 1978, pp. 461-464. doi:10.1214/aos/1176344136

[24] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of Educational Psychology, Vol. 24, 1933, 417-441. doi:10.1037/h0071325

[25] A. Hyv?rinen and P. Pajunen, “Nonlinear Independent Component Analysis: Existence and Uniqueness Results,” Neural Network, Vol. 12, No. 2, 1999, pp. 209- 219.

[26] R. Murray-Smith and T. A. Johansen (Eds.), “Multiple Model Approaches to Nonlinear Modeling and Con-trol,” Taylor and Francis, London, UK, 1997.

[27] D. L. B. Jupp, “Approximation to Data by Splines with Free Knots,” SIAM Journal on Numerical Analysis, Vol. 15, No. 2, 1978, pp. 328-343. doi:10.1137/0715022

[28] J. Friedman, “Multivariate Adap-tive Regression Splines (with discussion),” Annals of Statistics, Vol. 19, 1991, pp. 1-141. doi:10.1214/aos/1176347963

[29] H. G. Burchard, “Splines (With Optimal Knots) are Better,” Ap-plicable Analysis, Vol. 3, 1974, pp. 309-319. doi:10.1080/00036817408839073

[30] J. M. Holland, “Adap-tation in Nature and Artificial Systems,” Ann Arbor, MI: The University of Michigan Press, 1975.

[31] J. Pittman, “Adap-tive Spline and Genetic Algorithms,” Journal of Computational and Graphical Statistics, Vol. 11, No. 3, pp. 1-24.

[32] David E. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” Kluwer Academic Publishers, Boston, MA, 1989.

[33] J. Hessner and R. M?nner, “In Proceedings of the First Workshop on Parallel Problem Solving from Nature,” Lecture Notes in Computer Science, Vol. 496, Springer- Verlag: Berlin, 1991, pp. 23-31.

[34] T. Takagi and M. Sugeno, “Fuzzy Identification of Sys- tems and Its Application to Mod-eling and Control,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 15, 1985, pp. 116-132.

[35] J. H. Steidl and Y. Lee, “The SCEC Phase III Strong- Motion DataBase,” Bul-letin of the Seismological Society of America, Vol. 90, No. 6B, 2000, pp. S113-S135. doi:10.1785/0120000511

[36] D. M. Boore, W. B. Joyner and T. E. Fumal, “Equations for Estimating Horizontal Response Spectra and Peak Acceleration from Western North American Earthquakes: A Summary of Recent Work,” Seismological Research Letters, Vol. 68, No. 1, 1997, pp. 128-153.

[37] K. Sadigh, C.-Y. Chang, J. A. Egan, F. Makdisi and R. R. Youngs, “Attenuation Relations for Shallow Crustal Earthquakes Based on California Strong Motion Data,” Seismological Research Letters, Vol. 68, No. 1, 1997, pp. 180-189.

[38] N. A. Abra-hamson, and W. J. Silva, “Empirical Response Spectral At-tenuation Relations for Shallow Crustal Earthquakes,” Seis-mological Research Letters, Vol. 68, No. 1, 1997, pp. 94-12.

[39] K. W. Campbell, “Empirical Near-source Attenua-tion Relations for Horizontal and Vertical Components of Peak Ground Acceleration, Peak Ground Velocity, and Pseudo-absolute Acceleration Response Spectra,” Seismologi-cal Research Letters, Vol. 68, No. 1, 1997, pp. 154-179.

[40] I. M. Idriss, “An Overview of Earthquake Ground Motion Perti-nent to Seismic Zonation,” 5th International Conference on Seismic Zonation, 1995, pp. 17-19.

[1] M. Xu and M. Golay, “Data-guided Model Combination by Decomposition and Aggregation,” Machine Learning, Vol. 63, No. 1, 2005, pp. 43-67.

[2] D. J. Bartholomew and M. Knott, “Latent Variable Models and Factor Analysis,” London: Ar-nold; New York: Oxford University Press, 1999.

[3] I. T. Jolliffe, “Principal Component Analysis,” New York: Springer-Verlag, 1986.

[4] A. Hyv?rinen, “Fast and Robust Fixed-Point Algorithms for Independent Component Analysis,” IEEE Transactions on Neural Networks, Vol. 10, No. 3, 1999, pp. 626-634.

[5] J. Karhunen and S. Malaroiu, “Locally Lin-ear Independent Component Analysis,” International Joint Conference on Neural Networks, 1999.

[6] T. A. Johansen and B. A. Foss, “Operating Regime Based Process Modeling and Identification,” Computers and Chemical Engineering, Vol. 21, 1997, pp. 159-176. doi:10.1016/0098-1354(95)00260-X

[7] G. J. McLachlan and K. E. Basford, “Mixture Models: Inference and Application to Clustering,” New York: Marcel Dekker, 1988.

[8] R. Murray-Smith and T. A. Johansen, “Local Learning in Local Model Networks,” Proceedings of IEE International Confer-ence on Artificial Neural Networks, Cambridge, UK, 1995, pp. 40-46. doi:10.1049/cp:19950526

[9] M. I. Jordan and R.A. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, Vol. 6, 1994, pp. 181-214. doi:10.1162/neco.1994.6.2.181

[10] J. Fan, “Local Model-ling,” Encyclopidea of Statistical Science, 1995.

[11] W. S. Cleveland and S. J. Devlin, “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” Journal of the American Statistical Association, Vol. 83, 1988, pp. 596-610.

[12] R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G. E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, Vol. 3, 1991, pp. 79-87. doi:10.1162/neco.1991.3.1.79

[13] U. S?derman, J. Top and J.-E. Str?mberg, “The Conceptual Side of Mode Switching,” Proceedings of IEE International Conference on Systems, Man, and Cybernetics, Le Touquet, France, 1993, pp. 245-250.

[14] S. Harrington, R. Zhang, P. H. Poole, F. Sciortino, and H. E. Stanley, “Liquid-Liquid Phase Transition: Evidence from Simulations,” Physical Review Letters, Vol. 78, No. 12, 1997, pp. 2409-2412. doi:10.1103/PhysRevLett.78.2409

[15] K. Honda, H. Ichihashi, M. Ohue and K. Kitaguchi, “Extraction of Local Independent Components Using Fuzzy Clustering,” Proceedings of 6th In-ternational Conference on Soft Computing, 2000.

[16] L. J. Breiman, H. Friedman, R. A. Olshen, and C. J. Stone, “Classi-fication and Regression Trees,” Belmont CA: Wadsworth, 1984.

[17] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in detecting Compact Well-Separated Clusters,” Journal of Cybernetics, Vol. 3, 1973, pp. 32-57. doi:10.1080/01969727308546046

[18] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Ple-num Press, New York, 1981.

[19] N. Kambhatla and T. Leen, “Dimension Reduction by Local Principal Component Analy-sis,” Neural Computation, Vol. 9, 1997, pp. 1493-1516. doi:10.1162/neco.1997.9.7.1493

[20] J. Karhunen and S. Ma-laroiu, “Local Independent Component Analysis Using Clus-tering,” Proc. First Int. Workshop on Independent Component Analysis and Signal Separation, 1999, pp. 43-48.

[21] S. E. Geman, Bienenstock and R. Doursat, “Neural Networks and the Bias/Variance Dilemma,” Neural Computation, Vol. 4, 1992, pp. 1-58.

[22] H. Akaike, “Information Theory and an Exten-sion of the Maximum Likelihood Principle,” 2nd International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.), 1973, pp. 267-281.

[23] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, Vol. 6, 1978, pp. 461-464. doi:10.1214/aos/1176344136

[24] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of Educational Psychology, Vol. 24, 1933, 417-441. doi:10.1037/h0071325

[25] A. Hyv?rinen and P. Pajunen, “Nonlinear Independent Component Analysis: Existence and Uniqueness Results,” Neural Network, Vol. 12, No. 2, 1999, pp. 209- 219.

[26] R. Murray-Smith and T. A. Johansen (Eds.), “Multiple Model Approaches to Nonlinear Modeling and Con-trol,” Taylor and Francis, London, UK, 1997.

[27] D. L. B. Jupp, “Approximation to Data by Splines with Free Knots,” SIAM Journal on Numerical Analysis, Vol. 15, No. 2, 1978, pp. 328-343. doi:10.1137/0715022

[28] J. Friedman, “Multivariate Adap-tive Regression Splines (with discussion),” Annals of Statistics, Vol. 19, 1991, pp. 1-141. doi:10.1214/aos/1176347963

[29] H. G. Burchard, “Splines (With Optimal Knots) are Better,” Ap-plicable Analysis, Vol. 3, 1974, pp. 309-319. doi:10.1080/00036817408839073

[30] J. M. Holland, “Adap-tation in Nature and Artificial Systems,” Ann Arbor, MI: The University of Michigan Press, 1975.

[31] J. Pittman, “Adap-tive Spline and Genetic Algorithms,” Journal of Computational and Graphical Statistics, Vol. 11, No. 3, pp. 1-24.

[32] David E. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” Kluwer Academic Publishers, Boston, MA, 1989.

[33] J. Hessner and R. M?nner, “In Proceedings of the First Workshop on Parallel Problem Solving from Nature,” Lecture Notes in Computer Science, Vol. 496, Springer- Verlag: Berlin, 1991, pp. 23-31.

[34] T. Takagi and M. Sugeno, “Fuzzy Identification of Sys- tems and Its Application to Mod-eling and Control,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 15, 1985, pp. 116-132.

[35] J. H. Steidl and Y. Lee, “The SCEC Phase III Strong- Motion DataBase,” Bul-letin of the Seismological Society of America, Vol. 90, No. 6B, 2000, pp. S113-S135. doi:10.1785/0120000511

[36] D. M. Boore, W. B. Joyner and T. E. Fumal, “Equations for Estimating Horizontal Response Spectra and Peak Acceleration from Western North American Earthquakes: A Summary of Recent Work,” Seismological Research Letters, Vol. 68, No. 1, 1997, pp. 128-153.

[37] K. Sadigh, C.-Y. Chang, J. A. Egan, F. Makdisi and R. R. Youngs, “Attenuation Relations for Shallow Crustal Earthquakes Based on California Strong Motion Data,” Seismological Research Letters, Vol. 68, No. 1, 1997, pp. 180-189.

[38] N. A. Abra-hamson, and W. J. Silva, “Empirical Response Spectral At-tenuation Relations for Shallow Crustal Earthquakes,” Seis-mological Research Letters, Vol. 68, No. 1, 1997, pp. 94-12.

[39] K. W. Campbell, “Empirical Near-source Attenua-tion Relations for Horizontal and Vertical Components of Peak Ground Acceleration, Peak Ground Velocity, and Pseudo-absolute Acceleration Response Spectra,” Seismologi-cal Research Letters, Vol. 68, No. 1, 1997, pp. 154-179.

[40] I. M. Idriss, “An Overview of Earthquake Ground Motion Perti-nent to Seismic Zonation,” 5th International Conference on Seismic Zonation, 1995, pp. 17-19.