Investigating the impact of nutrition and lifestyle on breast cancer: A data mining approach.

Document Type : Research Paper

Authors

1 Faculty of Industrial and Systems Engineering, Tarbiat Modares University, Tehran, Iran

2 Center of Excellence in Healthcare Systems Engineering, Tarbiat Modares University, Tehran, Iran

Abstract

Background: Breast cancer (BC) is the most common cancer and one of the main causes of death among women. This study was conducted to investigate the relationship between BC and nutrition and lifestyle, as well as compare machine learning models in predicting this disease.
Methods: We designed a questionnaire related to nutrition and lifestyle with a nutritionist's guidance and provided them to 569 patients. After data gathering, we developed some machine-learning algorithms like logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), and Support vector machine (SVM) classifiers. To make more accurate models, we used an oversampling method to avoid skewing the model due to the lack of balance in the target classes, a grid search method to adjust the model's hyperparameters and finally random forest to identify each variable's importance.
Results: The results of this research showed that the accuracy of the DT model was 0.95, SVM and LR were 0.93, and KNN was 0.86. The results indicated the better performance of DT among other models.
Conclusions: Our findings show that it is possible to predict the type of cancerous tumor with relatively high accuracy without using specific information about the tumor itself. In particular, in our study, the decision tree has shown better accuracy compared to other models.

Keywords

Main Subjects


Akram M, Iqbal M, Daniyal M, Khan AU. Awareness and current knowledge of breast cancer. Biological research. 2017;50:1-23.
Alsagheer RH, Alharan AF, Al-Haboobi AS. Popular decision tree algorithms of data mining techniques: a review. International Journal of Computer Science and Mobile Computing. 2017;6(6):133-42.
Argolo DF, Hudis CA, Iyengar NM. The Impact of Obesity on Breast Cancer. Current Oncology Reports. 2018;20(6):47.
Ayodele TO. Types of machine learning algorithms. New advances in machine learning. 2010;3:19-48.
Bisong E. Logistic Regression. In: Bisong E, editor. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. Berkeley, CA: Apress; 2019. p. 243-50.
Chaurasia V, Pal S, Tiwari B. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology. 2018;12(2):119-26.
Chlebowski RT. Nutrition and physical activity influence on breast cancer incidence and outcome. The Breast. 2013;22:S30-S7.
Cunningham P, Cord M, Delany SJ. Supervised Learning. In: Cord M, Cunningham P, editors. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. p. 21-49.
De Cicco P, Catani MV, Gasperi V, Sibilano M, Quaglietta M, Savini I. Nutrition and Breast Cancer: A Literature Review on Prevention, Treatment and Recurrence. Nutrients. 2019;11(7):1514.
Dieli-Conwright CM, Lee K, Kiwata JL. Reducing the Risk of Breast Cancer Recurrence: an Evaluation of the Effects and Mechanisms of Diet and Exercise. Current Breast Cancer Reports. 2016;8(3):139-50.
Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research. 2018;61:863-905.
Ferrini K, Ghelfi F, Mannucci R, Titta L. Lifestyle, nutrition and breast cancer: facts and presumptions for consideration. Ecancermedicalscience. 2015;9.
Ghosn B, Benisi-Kohansal S, Ebrahimpour-Koujan S, Azadbakht L, Esmaillzadeh A. Association between healthy lifestyle score and breast cancer. Nutrition Journal. 2020;19(1):4.
Gonçalves L, Subtil A, Oliveira MR, de Zea Bermudez P. ROC curve estimation: An overview. REVSTAT-Statistical journal. 2014;12(1):1–20-1–.
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal. 2015;13:8-17.
Kruk J. Lifestyle components and primary breast cancer prevention. Asian Pac J Cancer Prev. 2014;15(24):10543-55.
Kursa MB, Rudnicki WR. The all relevant feature selection using random forest. arXiv preprint arXiv:11065112. 2011.
Li J, Zhou Z, Dong J, Fu Y, Li Y, Luan Z, et al. Predicting breast cancer 5-year survival using machine learning: A systematic review. PloS one. 2021;16(4):e0250370.
Liashchynskyi P, Liashchynskyi P. Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:191206059. 2019.
Raikwal J, Saxena K. Performance evaluation of SVM and k-nearest neighbor algorithm over medical data set. International Journal of Computer Applications. 2012;50(14).
Shanbehzadeh M, Nopour R, Erfannia L, Amraei M. Comparing Data Mining Algorithms for Breast Cancer Diagnosis. Shiraz E Medical Journal. 2022.
Seger C. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. 2018.
Seiler A, Chen MA, Brown RL, Fagundes CP. Obesity, Dietary Factors, Nutrition, and Breast Cancer Risk. Current Breast Cancer Reports. 2018;10(1):14-27.
Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology. 2014;14:1-13.
Zhou Z-H. Machine learning: Springer Nature; 2021.
Waks AG, Winer EP. Breast Cancer Treatment: A Review. JAMA. 2019;321(3):288-300.
Wang L. Early Diagnosis of Breast Cancer. Sensors. 2017;17(7):1572.
Weigelt B, Geyer FC, Reis-Filho JS. Histological types of breast cancer: How special are they? Molecular Oncology. 2010;4(3):192-208.

Articles in Press, Accepted Manuscript
Available Online from 18 November 2023
  • Receive Date: 18 November 2023
  • Accept Date: 18 November 2023