Back
 AM  Vol.11 No.12 , December 2020
Credit Card Fraud Detection Using Weighted Support Vector Machine
Abstract: Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
Cite this paper: Zhang, D. , Bhandari, B. and Black, D. (2020) Credit Card Fraud Detection Using Weighted Support Vector Machine. Applied Mathematics, 11, 1275-1291. doi: 10.4236/am.2020.1112087.
References

[1]   Consumer Sentinel Network Data Book 2019 (2020).
https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-
2019/consumer_sentinel_network_data_book_2019.pdf


[2]   Ertekin, S., Huang, J., Bottou, L. and Giles, L. (2007) Learning on the Border: Active Learning in Imbalanced Data Classification. Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, Lisbon, 6-10 November 2007, 127-136.
https://doi.org/10.1145/1321440.1321461

[3]   Huang, Y.-M. and Du, S.-X. (2005) Weighted Support Vector Machine for Classification with Uneven Training Class Sizes. 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, Vol. 7, 4365-4369.
https://doi.org/10.1109/ICMLC.2005.1527706

[4]   Bekkar, M., Djemaa, H.K. and Alitouche, T.A. (2009) Evaluation Measures for Models Assessment over Imbalanced Data Sets. Journal of Information Engineering and Applications, 3, 27-38.

[5]   Gu, Q., Zhu, L. and Cai, Z.H. (2009) Evaluation Measures of the Classification Performance of Imbalanced Data Sets. 2009 International Symposium on Intelligence Computation and Applications, Huangshi, 23-25 October 2009, 461-471.
https://doi.org/10.1007/978-3-642-04962-0_53

[6]   Hossin, M. and Sulaiman, M. (2015) A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5, 1-11.
https://doi.org/10.5121/ijdkp.2015.5201

[7]   Bhattacharyya, S., Jha, S., Tharakunnel, K. and Westland, J.C. (2001) Data Mining for Credit Card Fraud: A Comparative Study. Decision Support Systems, 50, 602-613.
https://doi.org/10.1016/j.dss.2010.08.008

[8]   Sun, Y.M., Kamel, M.S., Wong, A.K.C. and Wang, Y. (2007) Cost-Sensitive Boosting for Classification of Imbalanced Data. Pattern Recognition, 40, 3358-3378.
https://doi.org/10.1016/j.patcog.2007.04.009

[9]   Dubey, R., Zhou, J.Y., Wang, Y.L., Thompson, P.M., Ye, J.P. and Alzheimer’s Disease Neuroimaging Initiative and Others (2014) Analysis of Sampling Techniques for Imbalanced Data: Ann = 648 ADNI Study. ACM SIGMOD Record, 87, 220-241.
https://doi.org/10.1016/j.neuroimage.2013.10.005

[10]   Fiore, U., De Santis, A., Perla, F., Zanetti, P. and Palmieri, F. (2019) Using Generative Adversarial Net-Works for Improving Classification Effectiveness in Credit Card Fraud Detection. Information Sciences, 479, 448-455.
https://doi.org/10.1016/j.ins.2017.12.030

[11]   More, A. (2016) Survey of Resampling Techniques for Improving Classification Performance in Unbalanced Datasets.

[12]   Chan, P.K., Fan, W., Prodromidis, A.L. and Stolfo, S.J. (1999) Distributed Data Mining in Credit Card Fraud Detection. IEEE Intelligent Systems and Their Applications, 14, 67-74.
https://doi.org/10.1109/5254.809570

[13]   Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L.Y. and Caelen, O. (2018) Sequence Classification for Credit Card Fraud Detection. Expert Systems with Applications, 100, 234-245.
https://doi.org/10.1016/j.eswa.2018.01.037

[14]   Berry, M.J.A. and Linoff, G.S. (2004) Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. John Wiley & Sons, Hoboken.

[15]   Lepoivre, M.R., Avanzini, C.O., Bignon, G., Legendre, L. and Piwele, A.K. (2016) Credit Card Fraud Detection with Unsupervised Algorithms. Journal of Advances in Information Technology, 7, 34-38.
https://doi.org/10.12720/jait.7.1.34-38

[16]   Witten, I.H. and Frank, E. (2002) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. ACM SIGMOD Record, 31, 76-77.
https://doi.org/10.1145/507338.507355

[17]   Hilas, C.S. and Mastorocostas, P.A. (2008) An Application of Supervised and Unsupervised Learning Approaches to Telecommunications Fraud Detection. Knowledge-Based Systems, 21, 721-726.
https://doi.org/10.1016/j.knosys.2008.03.026

[18]   Niu, X.T., Wang, L. and Yang, X.L. (2019) A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised.

[19]   McCulloch, W.S. and Pitts, W. (1943) A Logical Calculus of the Ideas Immanent in Nervous Activity. The Bulletin of Mathematical Biophysics, 5, 115-133.
https://doi.org/10.1007/BF02478259

[20]   Venables, W.N. and Ripley, B.D. (2013) Modern Applied Statistics with S-PLUS. Springer Science & Business Media, Berlin.

[21]   Bishop, C.M. (2006) Pattern Recognition and Machine Learning. Springer, Berlin.

[22]   Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992) A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 27-29 July 1992, 144-152.
https://doi.org/10.1145/130385.130401

[23]   Gareth, J., Daniela, W., Trevor, H. and Robert, T. (2013) An Introduction to Statistical Learning: With Applications in R. Springer, Berlin.
https://doi.org/10.1007/978-1-4614-7138-7

[24]   Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297.
https://doi.org/10.1007/BF00994018

[25]   Kramer, K.A., Hall, L.O., Goldgof, D.B., Remsen, A. and Luo, T. (2009) Fast Support Vector Machines for Continuous Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 65, 989-1001.
https://doi.org/10.1109/TSMCB.2008.2011645

[26]   Achirul Nanda, M., Boro Seminar, K., Nandika, D. and Maddu, A. (2018) A Comparison Study of Kernel Functions in the Support Vector Machine and Its Application for Termite Detection. Information, 9, 1-14.
https://doi.org/10.3390/info9010005

[27]   Vapnik, V. (2013) The Nature of Statistical Learning Theory. Springer Science & Business Media, Berlin.

[28]   Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl Jr., K.C. (2017) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons, Berlin.

 
 
Top