Version 1
: Received: 9 September 2024 / Approved: 9 September 2024 / Online: 10 September 2024 (09:56:57 CEST)
How to cite:
Bhandary, R.; Ghosh, B. Credit Card Default Prediction: An Empirical Analysis on Prediction Performance Between Statistical and Machine Learning Methods. Preprints2024, 2024090729. https://doi.org/10.20944/preprints202409.0729.v1
Bhandary, R.; Ghosh, B. Credit Card Default Prediction: An Empirical Analysis on Prediction Performance Between Statistical and Machine Learning Methods. Preprints 2024, 2024090729. https://doi.org/10.20944/preprints202409.0729.v1
Bhandary, R.; Ghosh, B. Credit Card Default Prediction: An Empirical Analysis on Prediction Performance Between Statistical and Machine Learning Methods. Preprints2024, 2024090729. https://doi.org/10.20944/preprints202409.0729.v1
APA Style
Bhandary, R., & Ghosh, B. (2024). Credit Card Default Prediction: An Empirical Analysis on Prediction Performance Between Statistical and Machine Learning Methods. Preprints. https://doi.org/10.20944/preprints202409.0729.v1
Chicago/Turabian Style
Bhandary, R. and Bidyut Ghosh. 2024 "Credit Card Default Prediction: An Empirical Analysis on Prediction Performance Between Statistical and Machine Learning Methods" Preprints. https://doi.org/10.20944/preprints202409.0729.v1
Abstract
This article compares the predictive capabilities of six models namely the Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machine (SVM), XGBoost, Random Forest (RF) and Deep Neural Network (DNN) to predict the default behaviour of credit card holders in Taiwan using data from the UCI machine learning database. Python programming language was used for data analysis. Statistical methods were compared with machine learning algorithms using the confusion matrix measured in metric terms of prediction accuracy, sensitivity, specificity, precision, G-mean, F1 score, ROC and AUC. The dataset contains 30,000 credit card user’s information with 6636 default observations and 23,364 non-default cases. The study results found that modern machine learning methods outperformed traditional statistical methods in terms of predictive performance measured by F1 score, G-mean and AUC. Traditional methods like logistic regression were marginally better than linear discriminant analysis and support vector machines in terms of predictive performance measured by area under the receiver operating characteristic curve. In the modern machine learning methods, deep neural networks were better in most of the predictive performance metrics than XGBoost and Random Forest methods.
Keywords
Credit card default; Confusion matrix; Deep Neural Network; Default prediction; Linear Discriminant Analysis; Logistic regression; Machine learning; Random Forest; Support Vector Machine; XGBoost
Subject
Business, Economics and Management, Econometrics and Statistics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.