Using Non-Additive Measure for Optimization-Based Nonlinear Classification

Affiliation(s)

College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE USA.

Department of Mathematics University of Nebraska at Omaha, Omaha, NE USA 68182.

Guilin University of Electronic Technology, Guilin, China.

College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE USA.

Department of Mathematics University of Nebraska at Omaha, Omaha, NE USA 68182.

Guilin University of Electronic Technology, Guilin, China.

ABSTRACT

Over the past few decades, numerous optimization-based methods have been proposed for solving the classification problem in data mining. Classic optimization-based methods do not consider attribute interactions toward classification. Thus, a novel learning machine is needed to provide a better understanding on the nature of classification when the interaction among contributions from various attributes cannot be ignored. The interactions can be described by a non-additive measure while the Choquet integral can serve as the mathematical tool to aggregate the values of attributes and the corresponding values of a non-additive measure. As a main part of this research, a new nonlinear classification method with non-additive measures is proposed. Experimental results show that applying non-additive measures on the classic optimization-based models improves the classification robustness and accuracy compared with some popular classification methods. In addition, motivated by well-known Support Vector Machine approach, we transform the primal optimization-based nonlinear classification model with the signed non-additive measure into its dual form by applying Lagrangian optimization theory and Wolfes dual programming theory. As a result, 2^{n} – 1 parameters of the signed non-additive measure can now be approximated with m (number of records) Lagrangian multipliers by applying necessary conditions of the primal classification problem to be optimal. This method of parameter approximation is a breakthrough for solving a non-additive measure practically when there are relatively small number of training cases available (*m*<<2^{n}－1). Furthermore, the kernel-based learning method engages the nonlinear classifiers to achieve better classification accuracy. The research produces practically deliverable nonlinear models with the non-additive measure for classification problem in data mining when interactions among attributes are considered.

Over the past few decades, numerous optimization-based methods have been proposed for solving the classification problem in data mining. Classic optimization-based methods do not consider attribute interactions toward classification. Thus, a novel learning machine is needed to provide a better understanding on the nature of classification when the interaction among contributions from various attributes cannot be ignored. The interactions can be described by a non-additive measure while the Choquet integral can serve as the mathematical tool to aggregate the values of attributes and the corresponding values of a non-additive measure. As a main part of this research, a new nonlinear classification method with non-additive measures is proposed. Experimental results show that applying non-additive measures on the classic optimization-based models improves the classification robustness and accuracy compared with some popular classification methods. In addition, motivated by well-known Support Vector Machine approach, we transform the primal optimization-based nonlinear classification model with the signed non-additive measure into its dual form by applying Lagrangian optimization theory and Wolfes dual programming theory. As a result, 2

Cite this paper

N. Yan, Z. Chen, Y. Shi, Z. Wang and G. Huang, "Using Non-Additive Measure for Optimization-Based Nonlinear Classification,"*American Journal of Operations Research*, Vol. 2 No. 3, 2012, pp. 364-373. doi: 10.4236/ajor.2012.23044.

N. Yan, Z. Chen, Y. Shi, Z. Wang and G. Huang, "Using Non-Additive Measure for Optimization-Based Nonlinear Classification,"

References

[1] N. Freed and F. Glover, “Simple but Powerful Goal Programming Models for Discriminate Problems,” European Journal of Operational Research, Vol. 7, No. 1, 1981, pp. 44-60. doi:10.1016/0377-2217(81)90048-5

[2] N. Freed and F. Glover, “Evaluating Alternative Linear, Programming Models to Solve the Two-Group Discriminate Problem,” Decision Science, Vol. 17, No. 2, 1986, pp. 151-162. doi:10.1111/j.1540-5915.1986.tb00218.x

[3] Y. Shi, “Multiple Criteria and Multiple Constraint Levels Linear Programming: Concepts, Techniques and Applications,” World Scientific Pub Co Inc., New Jersey, 2001.

[4] G. Kou, Y. Peng, Z. Chen and Y. Shi, “Multiple Criteria Mathematical Programming for Multi-Class Classification and Application in Network Intrusion Detection,” Information Sciences, Vol. 179, No. 4, 2009, pp. 371-381. doi:10.1016/j.ins.2008.10.025

[5] Y. Peng, G. Kou, Y. Shi and Z. Chen, “A Multi-Criteria Convex Quadratic Programming Model for Credit Data Analysis,” Decision Support Systems, Vol. 44, No. 4, 2008, pp. 1016-1030. doi:10.1016/j.dss.2007.12.001

[6] V. Vapnik, “The Nature of Statistical Learning Theory,” Springer-Verlag, New York, 1995.

[7] G. Choquet, “Theory of Capacities,” Annales de l’Institut Fourier, Vol. 5, 1954, pp. 131-295. doi:10.5802/aif.53

[8] Z. Wang and G. J. Klir, “Fuzzy Measure Theory,” Plenum, New York, 1992.

[9] Z. Wang and G. J. Klir, “Generalized Measure Theory,” Springer, New York, 2008.

[10] Z. Wang, K.-S. Leung and G. J. Klir, “Applying Fuzzy Measures and Nonlinear Integrals in Data Mining,” Fuzzy Sets and Systems, Vol. 156, No. 3, 2005, pp. 371-380. doi:10.1016/j.fss.2005.05.034

[11] Z. Wang and H. Guo, “A New Genetic Algorithm for Nonlinear Multiregressions Based on Generalized Choquet Integrals,” The 12th IEEE International Conference on Fuzzy Systems (FUZZ’03), Vol. 2, 25-28 May 2003, pp. 819-821.

[12] M. Grabisch and M. Sugeno, “Multi-Attribute Classification Using Fuzzy Integral,” IEEE International Conference on Fuzzy System, San Diego, 8-12 March 1992, pp. 47-54.

[13] M. Grabisch and J.-M. Nicolas, “Classification by Fuzzy Integral: Performance and Tests,” Fuzzy Sets System, Vol. 65, No. 2-3, 1994, pp. 255-271. doi:10.1016/0165-0114(94)90023-X

[14] L. Mikenina and H. J. Zimmermann, “Improved Feature Selection and Classification by the 2-Additive Fuzzy Measure,” Fuzzy Sets and Systems, Vol. 107, No. 2, 1999, pp. 197-218. doi:10.1016/S0165-0114(98)00429-1

[15] K. Xu, W. Z., P. Heng and K. Leung, “Classification by Nonlinear Integral Projections,” IEEE Transactions on Fuzzy Systems, Vol. 11, No. 2, 2003, pp. 187-201. doi:10.1109/TFUZZ.2003.809891

[16] H. Fang, M. Rizzo, H. Wang, K. Espy and Z. Wang, “A New Nonlinear Classifier with a Penalized Signed Fuzzy Measure Using Effective Genetic Algorithm,” Pattern Recognition, Vol. 43, No. 4, 2010, pp. 1393-1401. doi:10.1016/j.patcog.2009.10.006

[17] J. Chu, Z. Wang and Y. Shi, “Analysis to the Contributions from Feature Attributes in Nonlinear Classification Based on the Choquet Integral,” 2010 IEEE International Conference on Granular Computing (GrC), San Jose, 14-16 August 2010, pp. 677-682.

[18] T. Murofushi, M. Sugeno and K. Fujimoto, “Separated Hierarchical Decomposition of the Choquet Integral,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 5, No. 5, 1997, pp. 563- 585. doi:10.1142/S0218488597000439

[19] H. Kuhn and A. Tucker, “Nonlinear Programming,” Proceedings of 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, 1951, pp. 481-491.

[20] N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector Machines,” Cambridge University Press, Cambridge, 2000.

[21] B. Boser, I. Guyon and V. N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 144-152. doi:10.1145/130385.130401

[22] J. Platt, “Fast Training of Support Vector Machines Using Sequential Minimal Optimization,” Technical Report, Microsoft Research, 1998.

[23] E. Osuna, R. Freund and F. Girosi, “An Improved Training Algorithm for Support Vector Machines,” Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop, Amelia Island, 24-26 September 1997, pp. 276-285.

[1] N. Freed and F. Glover, “Simple but Powerful Goal Programming Models for Discriminate Problems,” European Journal of Operational Research, Vol. 7, No. 1, 1981, pp. 44-60. doi:10.1016/0377-2217(81)90048-5

[2] N. Freed and F. Glover, “Evaluating Alternative Linear, Programming Models to Solve the Two-Group Discriminate Problem,” Decision Science, Vol. 17, No. 2, 1986, pp. 151-162. doi:10.1111/j.1540-5915.1986.tb00218.x

[3] Y. Shi, “Multiple Criteria and Multiple Constraint Levels Linear Programming: Concepts, Techniques and Applications,” World Scientific Pub Co Inc., New Jersey, 2001.

[4] G. Kou, Y. Peng, Z. Chen and Y. Shi, “Multiple Criteria Mathematical Programming for Multi-Class Classification and Application in Network Intrusion Detection,” Information Sciences, Vol. 179, No. 4, 2009, pp. 371-381. doi:10.1016/j.ins.2008.10.025

[5] Y. Peng, G. Kou, Y. Shi and Z. Chen, “A Multi-Criteria Convex Quadratic Programming Model for Credit Data Analysis,” Decision Support Systems, Vol. 44, No. 4, 2008, pp. 1016-1030. doi:10.1016/j.dss.2007.12.001

[6] V. Vapnik, “The Nature of Statistical Learning Theory,” Springer-Verlag, New York, 1995.

[7] G. Choquet, “Theory of Capacities,” Annales de l’Institut Fourier, Vol. 5, 1954, pp. 131-295. doi:10.5802/aif.53

[8] Z. Wang and G. J. Klir, “Fuzzy Measure Theory,” Plenum, New York, 1992.

[9] Z. Wang and G. J. Klir, “Generalized Measure Theory,” Springer, New York, 2008.

[10] Z. Wang, K.-S. Leung and G. J. Klir, “Applying Fuzzy Measures and Nonlinear Integrals in Data Mining,” Fuzzy Sets and Systems, Vol. 156, No. 3, 2005, pp. 371-380. doi:10.1016/j.fss.2005.05.034

[11] Z. Wang and H. Guo, “A New Genetic Algorithm for Nonlinear Multiregressions Based on Generalized Choquet Integrals,” The 12th IEEE International Conference on Fuzzy Systems (FUZZ’03), Vol. 2, 25-28 May 2003, pp. 819-821.

[12] M. Grabisch and M. Sugeno, “Multi-Attribute Classification Using Fuzzy Integral,” IEEE International Conference on Fuzzy System, San Diego, 8-12 March 1992, pp. 47-54.

[13] M. Grabisch and J.-M. Nicolas, “Classification by Fuzzy Integral: Performance and Tests,” Fuzzy Sets System, Vol. 65, No. 2-3, 1994, pp. 255-271. doi:10.1016/0165-0114(94)90023-X

[14] L. Mikenina and H. J. Zimmermann, “Improved Feature Selection and Classification by the 2-Additive Fuzzy Measure,” Fuzzy Sets and Systems, Vol. 107, No. 2, 1999, pp. 197-218. doi:10.1016/S0165-0114(98)00429-1

[15] K. Xu, W. Z., P. Heng and K. Leung, “Classification by Nonlinear Integral Projections,” IEEE Transactions on Fuzzy Systems, Vol. 11, No. 2, 2003, pp. 187-201. doi:10.1109/TFUZZ.2003.809891

[16] H. Fang, M. Rizzo, H. Wang, K. Espy and Z. Wang, “A New Nonlinear Classifier with a Penalized Signed Fuzzy Measure Using Effective Genetic Algorithm,” Pattern Recognition, Vol. 43, No. 4, 2010, pp. 1393-1401. doi:10.1016/j.patcog.2009.10.006

[17] J. Chu, Z. Wang and Y. Shi, “Analysis to the Contributions from Feature Attributes in Nonlinear Classification Based on the Choquet Integral,” 2010 IEEE International Conference on Granular Computing (GrC), San Jose, 14-16 August 2010, pp. 677-682.

[18] T. Murofushi, M. Sugeno and K. Fujimoto, “Separated Hierarchical Decomposition of the Choquet Integral,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 5, No. 5, 1997, pp. 563- 585. doi:10.1142/S0218488597000439

[19] H. Kuhn and A. Tucker, “Nonlinear Programming,” Proceedings of 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, 1951, pp. 481-491.

[20] N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector Machines,” Cambridge University Press, Cambridge, 2000.

[21] B. Boser, I. Guyon and V. N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 144-152. doi:10.1145/130385.130401

[22] J. Platt, “Fast Training of Support Vector Machines Using Sequential Minimal Optimization,” Technical Report, Microsoft Research, 1998.

[23] E. Osuna, R. Freund and F. Girosi, “An Improved Training Algorithm for Support Vector Machines,” Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop, Amelia Island, 24-26 September 1997, pp. 276-285.