Support Vector Machines for Regression: A Succinct Review of Large-Scale and Linear Programming Formulations

Author(s)
Pablo Rivas-Perea^{*},
Juan Cota-Ruiz^{*},
David Garcia Chaparro^{*},
Jorge Arturo Perez Venzor^{*},
Abel Quezada Carreón^{*},
Jose Gerardo Rosiles^{*}

Affiliation(s)

Department of Computer Science, School of Engineering & Computer Science, Baylor University, Waco, USA.

Department of Electrical & Computer Engineering, Autonomous University of Ciudad Juárez, Ciudad Juárez, México.

Rosiles Consulting, El Paso, USA.

Department of Computer Science, School of Engineering & Computer Science, Baylor University, Waco, USA.

Department of Electrical & Computer Engineering, Autonomous University of Ciudad Juárez, Ciudad Juárez, México.

Rosiles Consulting, El Paso, USA.

ABSTRACT

Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most commonly used formulations of support vector machines for regression (SVRs) aiming to emphasize its usability on large-scale applications. We review the general concept of support vector machines (SVMs), address the state-of-the-art on training methods SVMs, and explain the fundamental principle of SVRs. The most common learning methods for SVRs are introduced and linear programming-based SVR formulations are explained emphasizing its suitability for large-scale learning. Finally, this paper also discusses some open problems and current trends.

KEYWORDS

Support Vector Machines; Support Vector Regression; Linear Programming Support Vector Regression

Support Vector Machines; Support Vector Regression; Linear Programming Support Vector Regression

Cite this paper

P. Rivas-Perea, J. Cota-Ruiz, D. Chaparro, J. Venzor, A. Carreón and J. Rosiles, "Support Vector Machines for Regression: A Succinct Review of Large-Scale and Linear Programming Formulations,"*International Journal of Intelligence Science*, Vol. 3 No. 1, 2013, pp. 5-14. doi: 10.4236/ijis.2013.31002.

P. Rivas-Perea, J. Cota-Ruiz, D. Chaparro, J. Venzor, A. Carreón and J. Rosiles, "Support Vector Machines for Regression: A Succinct Review of Large-Scale and Linear Programming Formulations,"

References

[1] J. Mercer, “Functions of Positive and Negative Type, and Their Connection with the Theory of Integral Equations,” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, Vol. 209, 1909, pp. 415-446. doi:10.1098/rsta.1909.0016

[2] R. Courant and D. Hilbert, “Methods of Mathematical Physics,” Interscience, New York, 1966.

[3] J. Shawe-Taylor and N. Cristianini, “Kernel Methods for pattern Analysis,” Cambridge University Press, New York, 2004. doi:10.1017/CBO9780511809682.002

[4] N. Cristianini and B. Scholkopf, “Support Vector Machines and Kernel Methods: The New Generation of Learning Machines,” Ai Magazine, Vol. 23, No. 3, 2002, p. 31.

[5] B. E. Boser, I. M. Guyon and V. N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proceedings of the 5th Annual Workshop on Computational Learning Theory, Pittsburgh, July 1992, pp. 144-152.

[6] K. Labusch, E. Barth and T. Martinetz, “Simple Method for High-Performance Digit Recognition Based on Sparse Coding,” IEEE Transactions on Neural Networks, Vol. 19, No. 11, 2008, pp. 1985-1989. doi:10.1109/TNN.2008.2005830

[7] H. Al-Mubaid and S. Umair, “A New Text Categorization Techniqueusing Distributional Clustering and Learning Logic,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 9, 2006, pp. 1156-1165. doi:10.1109/TKDE.2006.135

[8] K. Wu and K.-H. Yap, “Fuzzy SVM for Content-Based Image Retrieval: A Pseudo-Label Support Vector Machine Framework,” IEEE Computational Intelligence Magazine, Vol. 1, No. 2, 2006, pp. 10-16. doi:10.1109/MCI.2006.1626490

[9] N. Sapankevych and R. Sankar, “Time Series Prediction Using Support Vector Machines: A Survey,” IEEE Computational Intelligence Magazine, Vol. 4, No. 2, 2009, pp. 24-38. doi:10.1109/MCI.2009.932254

[10] D. Peterson and M. Thaut, “Model and Feature Selection in Microarray Classification,” Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bio-informatics and Computational Biology, La Joll, 7-8 October 2004, pp. 56-60. doi:10.1109/CIBCB.2004.1393932

[11] A. Sanchez and V. David, “Advanced Support Vector Machines and Kernel Methods,” Neurocomputing, Vol. 55, No. 1-2, 2003, pp. 5-20. doi:10.1016/S0925-2312(03)00373-4

[12] L. Zhang and W. Zhou, “On the Sparseness of 1-Norm Support Vector Machines,” Neural Networks, Vol. 23, No. 3, 2010, pp. 373-385. doi:10.1016/j.neunet.2009.11.012

[13] V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer, New York, 1995.

[14] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression,” Statistics and Computing, Vol. 14, No. 3, 2004, pp. 199-222. doi:10.1023/B:STCO.0000035301.49549.88

[15] B. Huang, Z. Cai, Q. Gu and C. Chen, “Using Support Vector Regression for Classification,” Advanced Data Mining and Applications, Vol. 5139, 2008, pp. 581-588.

[16] V. Vapnik, S. Golowich, and A. Smola, “Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing,” Advances in Neural Information Processing Systems, Vol. 9, 1997, pp. 281-287.

[17] E. Osuna, R. Freund and F. Girosi, “An Improved Training Algorithmfor Support Vector Machines,” Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Workshop, Amelia Island, 24-26 September 1997, pp. 276-285. doi:10.1109/NNSP.1997.622408

[18] T. Joachims, “Making Large Scale SVM Learning Practical,” Advances in Kernel Methods, 1999, pp. 169-184.

[19] J. Platt, “Using Analytic QP and Sparseness to Speed Training of Support Vector Machines,” Advances in Neural Information Processing Systems, MIT Press, Cambridge, 1999, pp. 557-563.

[20] R. Collobert and S. Bengio, “Svmtorch: Support Vector Machines Forlarge-Scale Regression Problems,” Journal of Machine Learning Research, Vol. 1, 2001, pp. 143-160. doi:10.1162/15324430152733142

[21] R. Rifkin, “Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning,” Ph.D. Dissertation, Massachusetts Institute of Technology, 2002.

[22] O. Mangasarian and D. Musicant, “Large Scale Kernel Regression via linear Programming,” Machine Learning, Vol. 46, No. 1-3, 2002, pp. 255-269. doi:10.1023/A:1012422931930

[23] P. Drineas and M. W. Mahoney, “On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,” Journal of Machine Learning Research, Vol. 6, 2005, pp. 2153-2175.

[24] D. Hush, P. Kelly, C. Scovel and I. Steinwart, “QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines,” The Journal of Machine Learning Research, Vol. 7, 2006, p. 769.

[25] S. Sra, “Efficient Large Scale Linear Programming Support Vector Machines,” Lecture Notes in Computer Science, Vol. 4212, 2006, pp. 767-774. doi:10.1007/11871842_78

[26] Y. Censor and S. Zenios, “Parallel Optimization: Theory, Algorithms, and Applications,” Oxford University Press, Oxford, 1997.

[27] C. Hildreth, “A Quadratic Programming Procedure,” Naval Research Logistics Quarterly, Vol. 4, No. 1, 1957, pp. 79-85. doi:10.1002/nav.3800040113

[28] Z. Lu, J. Sun and K. R. Butts, “Linear Programming Support Vector Regression with Wavelet Kernel: A New Approach to Nonlinear Dynamical Systems Identification,” Mathematics and Computers in Simulation, Vol. 79, No. 7, 2009, pp. 2051-2063. doi:10.1016/j.matcom.2008.10.011

[29] Y. Torii and S. Abe, “Decomposition Techniques for Training Linear Programming Support Vector Machines,” Neurocomputing, Vol. 72, No.4-6, 2009, pp. 973-984. doi:10.1016/j.neucom.2008.04.008

[30] S. S. Haykin, “Neural Networks and Learning Machine,” Prentice Hall, Upper Saddle River, 2009.

[31] S. Wright, “Primal-Dual Interior-Point Methods,” Society for Industrial Mathematics, 1987, p. 309. doi:10.1137/1.9781611971453

[32] L. Wang, “Support Vector Machines: Theory and Applications,” Studies in Fuzziness and Soft Computing, Vol. 177, Springer-Verlag, Berlin, 2005.

[33] P. Bradley and O. Mangasarian, “Massive Data Discrimination via Linear Support Vector Machines,” Optimization Methods and Software, Vol. 13, No. 1, 2000, pp. 1-10. doi:10.1080/10556780008805771

[34] A. Smola, B. Scholkopf and G. Ratsch, “Linear Programs for Automatic Accuracy Control in Regression,” Ninth International Conference on Artificial Neural Networks, Edinburgh, 7-10 September 1999, pp. 575-580. doi:10.1049/cp:19991171

[35] B. Scholkopf and A. J. Smola, “Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond,” The MIT Press, Cambridge, 2002.

[36] P. R. Perea, “Algorithms for Training Large-Scale Linear Programming Support Vector Regression and Classification,” Ph.D. Dissertation, The University of Texas, El Paso, 2011.

[37] P. Rivas-Perea and J. Cota-Ruiz, “An Algorithm for Training a Large Scale Support Vector Machine for Regression Based on Linear Programming and Decomposition Methods,” Pattern Recognition Letters, Vol. 34, No. 4, 2013, pp. 439-451. doi:10.1016/j.patrec.2012.10.026

[38] Y.-Z. Xu and H. Qin, “A New Optimization Method of Large-Scale svms Based on Kernel Distance Clustering,” International Conference on Computational Intelligence andSoftware Engineering, Wuhan, 11-13 December 2009, pp. 1-4.

[39] C. Bishop, “Neural Networks for Pattern Recognition,” Oxford University Press, Oxford, 1995.

[1] J. Mercer, “Functions of Positive and Negative Type, and Their Connection with the Theory of Integral Equations,” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, Vol. 209, 1909, pp. 415-446. doi:10.1098/rsta.1909.0016

[2] R. Courant and D. Hilbert, “Methods of Mathematical Physics,” Interscience, New York, 1966.

[3] J. Shawe-Taylor and N. Cristianini, “Kernel Methods for pattern Analysis,” Cambridge University Press, New York, 2004. doi:10.1017/CBO9780511809682.002

[4] N. Cristianini and B. Scholkopf, “Support Vector Machines and Kernel Methods: The New Generation of Learning Machines,” Ai Magazine, Vol. 23, No. 3, 2002, p. 31.

[5] B. E. Boser, I. M. Guyon and V. N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proceedings of the 5th Annual Workshop on Computational Learning Theory, Pittsburgh, July 1992, pp. 144-152.

[6] K. Labusch, E. Barth and T. Martinetz, “Simple Method for High-Performance Digit Recognition Based on Sparse Coding,” IEEE Transactions on Neural Networks, Vol. 19, No. 11, 2008, pp. 1985-1989. doi:10.1109/TNN.2008.2005830

[7] H. Al-Mubaid and S. Umair, “A New Text Categorization Techniqueusing Distributional Clustering and Learning Logic,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 9, 2006, pp. 1156-1165. doi:10.1109/TKDE.2006.135

[8] K. Wu and K.-H. Yap, “Fuzzy SVM for Content-Based Image Retrieval: A Pseudo-Label Support Vector Machine Framework,” IEEE Computational Intelligence Magazine, Vol. 1, No. 2, 2006, pp. 10-16. doi:10.1109/MCI.2006.1626490

[9] N. Sapankevych and R. Sankar, “Time Series Prediction Using Support Vector Machines: A Survey,” IEEE Computational Intelligence Magazine, Vol. 4, No. 2, 2009, pp. 24-38. doi:10.1109/MCI.2009.932254

[10] D. Peterson and M. Thaut, “Model and Feature Selection in Microarray Classification,” Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bio-informatics and Computational Biology, La Joll, 7-8 October 2004, pp. 56-60. doi:10.1109/CIBCB.2004.1393932

[11] A. Sanchez and V. David, “Advanced Support Vector Machines and Kernel Methods,” Neurocomputing, Vol. 55, No. 1-2, 2003, pp. 5-20. doi:10.1016/S0925-2312(03)00373-4

[12] L. Zhang and W. Zhou, “On the Sparseness of 1-Norm Support Vector Machines,” Neural Networks, Vol. 23, No. 3, 2010, pp. 373-385. doi:10.1016/j.neunet.2009.11.012

[13] V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer, New York, 1995.

[14] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression,” Statistics and Computing, Vol. 14, No. 3, 2004, pp. 199-222. doi:10.1023/B:STCO.0000035301.49549.88

[15] B. Huang, Z. Cai, Q. Gu and C. Chen, “Using Support Vector Regression for Classification,” Advanced Data Mining and Applications, Vol. 5139, 2008, pp. 581-588.

[16] V. Vapnik, S. Golowich, and A. Smola, “Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing,” Advances in Neural Information Processing Systems, Vol. 9, 1997, pp. 281-287.

[17] E. Osuna, R. Freund and F. Girosi, “An Improved Training Algorithmfor Support Vector Machines,” Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Workshop, Amelia Island, 24-26 September 1997, pp. 276-285. doi:10.1109/NNSP.1997.622408

[18] T. Joachims, “Making Large Scale SVM Learning Practical,” Advances in Kernel Methods, 1999, pp. 169-184.

[19] J. Platt, “Using Analytic QP and Sparseness to Speed Training of Support Vector Machines,” Advances in Neural Information Processing Systems, MIT Press, Cambridge, 1999, pp. 557-563.

[20] R. Collobert and S. Bengio, “Svmtorch: Support Vector Machines Forlarge-Scale Regression Problems,” Journal of Machine Learning Research, Vol. 1, 2001, pp. 143-160. doi:10.1162/15324430152733142

[21] R. Rifkin, “Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning,” Ph.D. Dissertation, Massachusetts Institute of Technology, 2002.

[22] O. Mangasarian and D. Musicant, “Large Scale Kernel Regression via linear Programming,” Machine Learning, Vol. 46, No. 1-3, 2002, pp. 255-269. doi:10.1023/A:1012422931930

[23] P. Drineas and M. W. Mahoney, “On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,” Journal of Machine Learning Research, Vol. 6, 2005, pp. 2153-2175.

[24] D. Hush, P. Kelly, C. Scovel and I. Steinwart, “QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines,” The Journal of Machine Learning Research, Vol. 7, 2006, p. 769.

[25] S. Sra, “Efficient Large Scale Linear Programming Support Vector Machines,” Lecture Notes in Computer Science, Vol. 4212, 2006, pp. 767-774. doi:10.1007/11871842_78

[26] Y. Censor and S. Zenios, “Parallel Optimization: Theory, Algorithms, and Applications,” Oxford University Press, Oxford, 1997.

[27] C. Hildreth, “A Quadratic Programming Procedure,” Naval Research Logistics Quarterly, Vol. 4, No. 1, 1957, pp. 79-85. doi:10.1002/nav.3800040113

[28] Z. Lu, J. Sun and K. R. Butts, “Linear Programming Support Vector Regression with Wavelet Kernel: A New Approach to Nonlinear Dynamical Systems Identification,” Mathematics and Computers in Simulation, Vol. 79, No. 7, 2009, pp. 2051-2063. doi:10.1016/j.matcom.2008.10.011

[29] Y. Torii and S. Abe, “Decomposition Techniques for Training Linear Programming Support Vector Machines,” Neurocomputing, Vol. 72, No.4-6, 2009, pp. 973-984. doi:10.1016/j.neucom.2008.04.008

[30] S. S. Haykin, “Neural Networks and Learning Machine,” Prentice Hall, Upper Saddle River, 2009.

[31] S. Wright, “Primal-Dual Interior-Point Methods,” Society for Industrial Mathematics, 1987, p. 309. doi:10.1137/1.9781611971453

[32] L. Wang, “Support Vector Machines: Theory and Applications,” Studies in Fuzziness and Soft Computing, Vol. 177, Springer-Verlag, Berlin, 2005.

[33] P. Bradley and O. Mangasarian, “Massive Data Discrimination via Linear Support Vector Machines,” Optimization Methods and Software, Vol. 13, No. 1, 2000, pp. 1-10. doi:10.1080/10556780008805771

[34] A. Smola, B. Scholkopf and G. Ratsch, “Linear Programs for Automatic Accuracy Control in Regression,” Ninth International Conference on Artificial Neural Networks, Edinburgh, 7-10 September 1999, pp. 575-580. doi:10.1049/cp:19991171

[35] B. Scholkopf and A. J. Smola, “Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond,” The MIT Press, Cambridge, 2002.

[36] P. R. Perea, “Algorithms for Training Large-Scale Linear Programming Support Vector Regression and Classification,” Ph.D. Dissertation, The University of Texas, El Paso, 2011.

[37] P. Rivas-Perea and J. Cota-Ruiz, “An Algorithm for Training a Large Scale Support Vector Machine for Regression Based on Linear Programming and Decomposition Methods,” Pattern Recognition Letters, Vol. 34, No. 4, 2013, pp. 439-451. doi:10.1016/j.patrec.2012.10.026

[38] Y.-Z. Xu and H. Qin, “A New Optimization Method of Large-Scale svms Based on Kernel Distance Clustering,” International Conference on Computational Intelligence andSoftware Engineering, Wuhan, 11-13 December 2009, pp. 1-4.

[39] C. Bishop, “Neural Networks for Pattern Recognition,” Oxford University Press, Oxford, 1995.