Journal of Industrial and Systems Engineering

Journal of Industrial and Systems Engineering

Investigating the impact of missing value imputation methods on the prediction of diabetes using machine learning

Document Type : Research Paper

Authors
1 School of Industrial Engineering, College of Engineering, University of Tehran, Tehran, Iran
2 Center national health insurance, Tehran, Iran
3 tehran university
4 School of Industrial Engineering, K. N. Toosi University of Technology (KNTU), Tehran, Iran
Abstract
Diabetes poses significant challenges due to its prevalence and the potential consequences of inaccurate or delayed diagnosis. This study focuses on enhancing prediction reliability to mitigate such risks. Initially, it identifies diabetes-related factors through correlation analysis with the target variable and implements models to address missing data. Subsequently, various imputation methods including CART, GMM, and RFR are employed to evaluate these factors. Results from each imputation scenario inform the selection of the most effective method. The study then employs ensemble algorithms like AdaBoost, Bagging, Gradient Boosting, and RF to enhance classification model accuracy. Further refinement is achieved by optimizing hyper-parameters through grid search. Evaluation involves comparing model predictions with those of medical professionals to assess accuracy. The findings reveal superior performance of optimized machine learning models over human predictions, indicating potential for improved diagnosis accuracy and reduced medical errors. This research contributes to advancing predictive modeling in diabetes diagnosis, offering prospects for enhanced community health and reduced socioeconomic burdens.
Keywords
Subjects

Abdali, N., Vaezi, M.A., Rabani, M. and Aghsami, A., 2024. A new data-driven decision-making method for therapist patient allocation and scheduling. Journal of Industrial and Systems Engineering.
Al-Hadeethi, H., Abdulla, S., Diykh, M., Deo, R. C., & Green, J. H. (2020). Adaptive boost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications. Expert Systems with Applications, 161, 113676.
Andrade, L., Rapp, T., & Sevilla-Dedieu, C. (2018). Quality of diabetes follow-up care and hospital admissions. International Journal of Health Economics and Management, 18, 153-167.
Aryai, V., & Goldsworthy, M. J. E. A. o. A. I. (2023). Day ahead carbon emission forecasting of the regional National Electricity Market using machine learning methods. 123, 106314.
Behdinian, A., Amani, M.A., Aghsami, A. and Jolai, F., 2022. An Integrating Machine Learning Algorithm and Simulation Method for Improving Software Project Management: A Case Study. Journal of Quality Engineering and Production Optimization, 7(1), pp.54-74.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967.
Blanquero, R., Carrizosa, E., Ramírez-Cobo, P. and Sillero-Denamiel, M.R., 2021. Variable selection for Naïve Bayes classification. Computers & Operations Research, 135, p.105456.
Deberneh, H. M., & Kim, I. (2021). Prediction of Type 2 diabetes based on machine learning algorithm. International Journal of Environmental Research and Public Health, 18(6), 3317.
Doğru, A., Buyrukoğlu, S. and Arı, M., 2023. A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Medical & Biological Engineering & Computing, 61(3), pp.785-797.
Fazakis, N., Kocsis, O., Dritsas, E., Alexiou, S., Fakotakis, N., & Moustakas, K. (2021). Machine learning tools for long-term type 2 diabetes risk prediction. IEEE Access, 9, 103737-103757.
Feng, D.-C., Liu, Z.-T., Wang, X.-D., Chen, Y., Chang, J.-Q., Wei, D.-F., & Jiang, Z.-M. (2020). Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Construction and Building Materials, 230, 117000.
Fereidouni, Z., Mehdizadeh Somarin, Z., Mohammadnazari, Z., Aghsami, A. and Jolai, F., 2022. Analysis of correlation between food consumption habits and COVID-19 outbreak. Journal of Industrial and Systems Engineering, 14(2), pp.86-118.
Ghosh, P., Azam, S., Karim, A., Hassan, M., Roy, K., & Jonkman, M. (2021). A comparative study of different machine learning tools in detecting diabetes. Procedia Computer Science, 192, 467-477.
Harimoorthy, K., & Thangavelu, M. (2021). Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system. Journal of Ambient Intelligence and Humanized Computing, 12(3), 3715-3723.
Heikes, K. E., Eddy, D. M., Arondekar, B., & Schlessinger, L. (2008). Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes care, 31(5), 1040-1045.
Hernández-Pereira, E., Fontenla-Romero, O., Bolón-Canedo, V., Cancela-Barizo, B., Guijarro-Berdiñas, B., & Alonso-Betanzos, A. J. A. I. (2022). Machine learning techniques to predict different levels of hospital care of CoVid-19. 52(6), 6413-6431.
Heydari, M., Teimouri, M., Heshmati, Z., & Alavinia, S. M. (2016). Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. International Journal of Diabetes in Developing Countries, 36(2), 167-173.
Jain, P. K., Quamer, W., & Pamula, R. J. O. (2021). Sports result prediction using data mining techniques in comparison with base line model. 58(1), 54-70.
Joshi, R. D., & Dhakal, C. K. (2021). Predicting type 2 diabetes using logistic regression and machine learning approaches. International Journal of Environmental Research and Public Health, 18(14), 7346.
Kamel, S. R., & Yaghoubzadeh, R. (2021). Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease. Informatics in Medicine Unlocked, 26, 100707.
Karatsiolis, S., & Schizas, C. N. (2012). Region based Support Vector Machine algorithm for medical diagnosis on Pima Indian Diabetes dataset. Paper presented at the 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).
Kee, O.T., Harun, H., Mustafa, N., Abdul Murad, N.A., Chin, S.F., Jaafar, R. and Abdullah, N., 2023. Cardiovascular complications in a diabetes prediction model using machine learning: a systematic review. Cardiovascular Diabetology, 22(1), p.13.
Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432-439.
Kumari, V. A., & Chitra, R. (2013). Classification of diabetes disease using support vector machine. International Journal of Engineering Research and Applications, 3(2), 1797-1801.
Li, J., Yuan, P., Hu, X., Huang, J., Cui, L., Cui, J., . . . Li, J. (2021). A tongue features fusion approach to predicting prediabetes and diabetes with machine learning. Journal of biomedical informatics, 115, 103693.
Lu, H., Uddin, S., Hajati, F., Moni, M. A., & Khushi, M. (2022). A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus. Applied Intelligence, 52(3), 2411-2422.
Mamoudan, M. M., Jafari, A., Mohammadnazari, Z., Nasiri, M. M., & Yazdani, M. (2023). Hybrid machine learning-metaheuristic model for sustainable agri-food production and supply chain planning under water scarcity. Resources, Environment and Sustainability, 14, 100133. doi:https://doi.org/10.1016/j.resenv.2023.100133
Mamoudan, M.M., Forouzanfar, D., Mohammadnazari, Z., Aghsami, A. and Jolai, F., 2023. Factor identification for insurance pricing mechanism using data mining and multi criteria decision making. Journal of Ambient Intelligence and Humanized Computing, 14(7), pp.8153-8172.
Manjurul Ahsan, M., & Siddique, Z. (2021). Machine Learning-Based Heart Disease Diagnosis: A Systematic Literature Review. arXiv e-prints, arXiv: 2112.06459.
Mansourypoor, F., & Asadi, S. (2017). Development of a Reinforcement Learning-based Evolutionary Fuzzy Rule-Based System for diabetes diagnosis. Computers in Biology and Medicine, 91, 337-352. doi:https://doi.org/10.1016/j.compbiomed.2017.10.024
Muhammad, L., Algehyne, E. A., & Usman, S. S. (2020). Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, 1(5), 1-10.
Nguyen, B. P., Pham, H. N., Tran, H., Nghiem, N., Nguyen, Q. H., Do, T. T., . . . Simpson, C. R. (2019). Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Computer methods and programs in biomedicine, 182, 105055.
Nozari, H. (2024). Green Supply Chain Management based on Artificial Intelligence of Everything. Journal of Economics and Management, 46, 171-188.
Nozari, H. (Ed.). (2023). Building Smart and Sustainable Businesses with Transformative Technologies. IGI Global.
Nozari, H., Ghahremani-Nahr, J., Fallah, M., & Szmelter-Jarosz, A. (2022). Assessment of cyber risks in an IoT-based supply chain using a fuzzy decision-making method. International Journal of Innovation in Management, Economics and Social Sciences, 2(1).
Özdemir, M. A., Özdemir, G. D., Gül, M., Güren, O., Ercan, U. K. J. M. L. S., & Technology. (2023). Machine learning to predict the antimicrobial activity of cold atmospheric plasma-activated liquids. 4(1), 015030.
Pan, L., Sun, W., Wan, W., Zeng, Q. and Xu, J., 2023. Research Progress of Diabetic Disease Prediction Model in Deep Learning. Journal of Theory and Practice of Engineering Science, 3(12), pp.15-21.
Park, J., & Edington, D. W. (2001). A sequential neural network model for diabetes prediction. Artificial intelligence in medicine, 23(3), 277-293.
Pourkhodabakhsh, N., Mamoudan, M. M., & Bozorgi-Amiri, A. (2023). Effective machine learning, Meta-heuristic algorithms and multi-criteria decision making to minimizing human resource turnover. Applied Intelligence, 53(12), 16309-16331.
Qi, H., Song, X., Liu, S., Zhang, Y., & Wong, K. K. L. (2023). KFPredict: An ensemble learning prediction framework for diabetes based on fusion of key features. Comput Methods Programs Biomed, 231, 107378. doi:10.1016/j.cmpb.2023.107378
Rastogi, R. and Bansal, M., 2023. Diabetes prediction model using data mining techniques. Measurement: Sensors, 25, p.100605.
Sai, M. J., Chettri, P., Panigrahi, R., Garg, A., Bhoi, A. K., & Barsocchi, P. J. I. J. o. C. I. S. (2023). An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes. 16(1), 14.
Santhanam, T., & Padmavathi, M. (2015). Application of K-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis. Procedia Computer Science, 47, 76-83.
Segar, M. W., Patel, K. V., Vaduganathan, M., Caughey, M. C., Jaeger, B. C., Basit, M., . . . Wang, T. J. J. D. (2021). Development and validation of optimal phenomapping methods to estimate long-term atherosclerotic cardiovascular disease risk in patients with type 2 diabetes. 64, 1583-1594.
Srivastava, N. K., Singh, S. K., & Singh, U. J. O. (2022). Analysis and prediction of Covid-19 spreading through Bayesian modelling with a case study of Uttar Pradesh, India. OPSEARCH, 1-16.
Sudharsan, B., Peeples, M., & Shomali, M. (2014). Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. Journal of diabetes science and technology, 9(1), 86-90.
Tsai, C.-F., Hu, Y.-H. J. K., & Systems, I. (2022). Empirical comparison of supervised learning techniques for missing value imputation. 64(4), 1047-1075.
Vijayan, V., & Ravikumar, A. (2014). Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. International journal of computer applications, 95(17).
Wang, L., Wang, X., Chen, A., Jin, X., & Che, H. (2020). Prediction of type 2 diabetes risk and its effect evaluation based on the XGBoost model. Paper presented at the Healthcare.
Werner de Vargas, V., Schneider Aranda, J. A., dos Santos Costa, R., da Silva Pereira, P. R., Victória Barbosa, J. L. J. K., & Systems, I. (2022). Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. 65(1), 31-57.
Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H. (2018). Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 515.

  • Receive Date 28 January 2024
  • Revise Date 02 March 2024
  • Accept Date 26 June 2024