Bank Customer Churn Prediction Based on Multi-stage Machine Learning: Integrating Data Quality Enhancement, SHAP Feature Insight, and LightGBM Optimization
DOI:
https://doi.org/10.54097/055y1366Keywords:
Bank customer churn prediction; Multi-stage machine learning; Data quality enhancement; SHAP; LightGBM optimization; Financial institutions.Abstract
The current academic research outlines a novel multi-faceted machine learning framework designed for predicting attrition amongst bank customers, where data quality refinement meets Shapley Additive exPlanations (SHAP) -based interpretative insight and inclination towards the fine-tuning of the Light Gradient Boosting Machine (LightGBM) algorithm. Making use of research methods designed towards the refinement of fiscal datasets, as documented in previous academic frameworks, as well as the utilization of SHAP analyses for the purpose of explaining model interpretability, is designed to shed light upon optimization avenues inherent to LightGBM, therefore providing empirical evidence that indicates superior performance relative to naive baselines by a margin of between twelve to eighteen percent. From the above, it is possible to infer the provision of a reliable but interpretable instrumentality offered to the world of finance interested in the reduction of defection levels while strengthening customer retention frameworks. This schema, having been calibrated using a million-scale bank customer dataset, also achieves a business closed-loop from churn prediction towards the derivation of intervention, enhancing retail banking applicability.
Downloads
References
[1] Amiri F, Yousefi M R, Lucas C, Shakery A, Yazdani N. Mutual information-based feature selection for intrusion detection systems. Journal of Network and Computer Applications, 2011, 34(4): 1184–1199. DOI: https://doi.org/10.1016/j.jnca.2011.01.002
[2] Chen T H. Do you know your customer? Bank risk assessment based on machine learning. Applied Soft Computing, 2020, 86: 105779. DOI: https://doi.org/10.1016/j.asoc.2019.105779
[3] Ghrib Z, Jaziri R, Romdhane R. Hybrid approach for anomaly detection in time series data. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020: 1–7. DOI: https://doi.org/10.1109/IJCNN48605.2020.9207013
[4] Leo M, Sharma S, Maddulety K. Machine learning in banking risk management: a literature review. Risks, 2019, 7(1): 29.
[5] Kanamori S, Abe T, Ito T, Emura K, Wang L, Yamamoto S, Moriai S. Privacy-preserving federated learning for detecting fraudulent financial transactions in Japanese banks. Journal of Information Processing, 2022, 30: 789–795. DOI: https://doi.org/10.2197/ipsjjip.30.789
[6] Shirazi F, Mohammadi M. A big data analytics model for customer churn prediction in the retiree segment. International Journal of Information Management, 2019, 48: 238–253. DOI: https://doi.org/10.1016/j.ijinfomgt.2018.10.005
[7] Leo M, Sharma S, Maddulety K. Machine learning in banking risk management: a literature review. Risks, 2019, 7(1): 29. DOI: https://doi.org/10.3390/risks7010029
[8] Lok L K, Hameed V A, Rana M E. Hybrid machine learning approach for anomaly detection. Indonesian Journal of Electrical Engineering and Computer Science, 2022, 27(2): 1016. DOI: https://doi.org/10.11591/ijeecs.v27.i2.pp1016-1024
[9] Peng K, Peng Y, Li W. Research on customer churn prediction and model interpretability analysis. PLOS ONE, 2023, 18(12): e0289724. DOI: https://doi.org/10.1371/journal.pone.0289724
[10] Li J, Liu W, Zhang J. Automating financial audits with random forests and real-time stream processing: a case study on efficiency and risk detection. Informatica, 2025, 49(16). DOI: https://doi.org/10.31449/inf.v49i16.7805
[11] Thennakoon A, Bhagyani C, Premadasa S, Mihiranga S, Kuruwitaarachchi N. Real-time credit card fraud detection using machine learning. Proceedings of the 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 2019: 488–493. DOI: https://doi.org/10.1109/CONFLUENCE.2019.8776942
[12] Shen G, Ouyang Y, Lu J, Yang Y, Sanchez V. Advancing video anomaly detection: a bi-directional hybrid framework for enhanced single- and multi-task approaches. IEEE Transactions on Image Processing, 2024. DOI: https://doi.org/10.1109/TIP.2024.3512369
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Business, Economics and Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







