A Comparative Study of Machine Learning with Statistical Feature Selection for Risk Detection of Diabetic

Isnen Hadi Al Ghozali, Muhammad Askar Fathin, Andy Rio Handoko

Abstract


Elevated glucose levels in the circulation are indicative of diabetes, a chronic medical condition. Prolonged unregulated blood glucose levels pose a significant risk of severe consequences, including renal failure, myocardial infarction, and lower limb amputation. The objective of this study is to conduct a comparative analysis of SVM, Naive Bayes, XGBoost, Random Forest, and ANN models in order to forecast the occurrence of diabetes. The research methodology comprises seven primary stages: (1) literature review, (2) data collection, (3) exploratory data analysis (EDA), (4) data preprocessing, (5) feature selection, (6) model development, and (7) model evaluation and comparison. The XGBoost model is the most suitable option, as indicated by the model evaluation results. The XGBoost model achieved a precision of 0.88, a recall of 0.87, and an accuracy of 0.8690. The XGBoost model has a RMSE of 0.3620 and a MSE of 0.1310.

Keywords


SVM; XGBoost; Random Forest; Naive Bayes; ANN

Full Text:

PDF

References


H. E. Massari, Z. Sabouri, S. Mhammedi, and N. Gherabi, “Diabetes Prediction Using Machine Learning Algorithms and Ontology,” J. ICT Stand., May 2022, doi: 10.13052/jicts2245-800X.10212.

M. M. Farag, M. Fouad, and A. T. Abdel-Hamid, “Automatic Severity Classification of Diabetic Retinopathy Based on DenseNet and Convolutional Block Attention Module,” IEEE Access, vol. 10, pp. 38299–38308, 2022, doi: 10.1109/ACCESS.2022.3165193.

Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques,” Prev. Chronic. Dis., vol. 16, p. 190109, Sep. 2019, doi: 10.5888/pcd16.190109.

S. Gupta, B. Kishan, and P. Gulia, “Comparative Analysis of Predictive Algorithms for Performance Measurement,” IEEE Access, vol. 12, pp. 33949–33958, 2024, doi: 10.1109/ACCESS.2024.3372082.

H. Zhang and Y. Zhang, “An Improved Sparrow Search Algorithm for Optimizing Support Vector Machines,” IEEE Access, vol. 11, pp. 8199–8206, 2023, doi: 10.1109/ACCESS.2023.3234579.

S. Liang, A. Q. M. Sabri, F. Alnajjar, and C. K. Loo, “Autism Spectrum Self-Stimulatory Behaviors Classification Using Explainable Temporal Coherency Deep Features and SVM Classifier,” IEEE Access, vol. 9, pp. 34264–34275, 2021, doi: 10.1109/ACCESS.2021.3061455.

D. N and N. P. K S, “Improved Clinical Diagnosis Using Predictive Analytics,” IEEE Access, vol. 10, pp. 75158–75175, 2022, doi: 10.1109/ACCESS.2022.3190416.

N. Assani, P. Matić, N. Kaštelan, and I. R. Čavka, “A Review of Artificial Neural Networks Applications in Maritime Industry,” IEEE Access, vol. 11, pp. 139823–139848, 2023, doi: 10.1109/ACCESS.2023.3341690.

B. A. S. Emambocus, M. B. Jasser, and A. Amphawan, “A Survey on the Optimization of Artificial Neural Networks Using Swarm Intelligence Algorithms,” IEEE Access, vol. 11, pp. 1280–1294, 2023, doi: 10.1109/ACCESS.2022.3233596.

C. Liu, Z. Gu, and J. Wang, “A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning,” IEEE Access, vol. 9, pp. 75729–75740, 2021, doi: 10.1109/ACCESS.2021.3082147.

Z. Huang and D. Chen, “A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm,” IEEE Access, vol. 10, pp. 3284–3293, 2022, doi: 10.1109/ACCESS.2021.3139595.

G. P. Shukla, S. Kumar, S. K. Pandey, R. Agarwal, N. Varshney, and A. Kumar, “Diagnosis and Detection of Alzheimer’s Disease Using Learning Algorithm,” Big Data Min. Anal., vol. 6, no. 4, pp. 504–512, Dec. 2023, doi: 10.26599/BDMA.2022.9020049.

P. Haldar et al., “XGBoosted Binary CNNs for Multi-Class Classification of Colorectal Polyp Size,” IEEE Access, vol. 11, pp. 128461–128472, 2023, doi: 10.1109/ACCESS.2023.3332826.

M. Varan, J. Azimjonov, and B. Maçal, “Enhancing Prostate Cancer Classification by Leveraging Key Radiomics Features and Using the Fine-Tuned Linear SVM Algorithm,” IEEE Access, vol. 11, pp. 88025–88039, 2023, doi: 10.1109/ACCESS.2023.3306515.

H. M. Alshamlan, “An Effective Filter Method Towards the Performance Improvement of FF-SVM Algorithm,” IEEE Access, vol. 9, pp. 140835–140840, 2021, doi: 10.1109/ACCESS.2021.3119233.

T. S. Alshammari, “Applying Machine Learning Algorithms for the Classification of Sleep Disorders,” IEEE Access, vol. 12, pp. 36110–36121, 2024, doi: 10.1109/ACCESS.2024.3374408.

S. Punitha, T. Stephan, R. Kannan, M. Mahmud, M. S. Kaiser, and S. B. Belhaouari, “Detecting COVID-19 From Lung Computed Tomography Images: A Swarm Optimized Artificial Neural Network Approach,” IEEE Access, vol. 11, pp. 12378–12393, 2023, doi: 10.1109/ACCESS.2023.3236812.

J.-G. Choi, I. Ko, and S. Han, “Depression Level Classification Using Machine Learning Classifiers Based on Actigraphy Data,” IEEE Access, vol. 9, pp. 116622–116646, 2021, doi: 10.1109/ACCESS.2021.3105393.

C. Zhang, X. Wang, S. Chen, H. Li, X. Wu, and X. Zhang, “A Modified Random Forest Based on Kappa Measure and Binary Artificial Bee Colony Algorithm,” IEEE Access, vol. 9, pp. 117679–117690, 2021, doi: 10.1109/ACCESS.2021.3105796.

T. Sinha Roy, J. K. Roy, and N. Mandal, “Conv-Random Forest-Based IoT: A Deep Learning Model Based on CNN and Random Forest for Classification and Analysis of Valvular Heart Diseases,” IEEE Open J. Instrum. Meas., vol. 2, pp. 1–17, 2023, doi: 10.1109/OJIM.2023.3320765.

T.-H. S. Li, H.-J. Chiu, and P.-H. Kuo, “Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm,” IEEE Access, vol. 10, pp. 91045–91058, 2022, doi: 10.1109/ACCESS.2022.3202295.

L. Jia, Z. Wang, S. Lv, and Z. Xu, “PE_DIM: An Efficient Probabilistic Ensemble Classification Algorithm for Diabetes Handling Class Imbalance Missing Values,” IEEE Access, vol. 10, pp. 107459–107476, 2022, doi: 10.1109/ACCESS.2022.3212067.

Z. Ahmed, B. Issac, and S. Das, “Ok-NB: An Enhanced OPTICS and k-Naive Bayes Classifier for Imbalance Classification With Overlapping,” IEEE Access, vol. 12, pp. 57458–57477, 2024, doi: 10.1109/ACCESS.2024.3391749.

H. C. S. C. Lima, F. E. B. Otero, L. H. C. Merschmann, and M. J. F. Souza, “A Novel Hybrid Feature Selection Algorithm for Hierarchical Classification,” IEEE Access, vol. 9, pp. 127278–127292, 2021, doi: 10.1109/ACCESS.2021.3112396.

G. J. Ansari, J. H. Shah, M. C. Q. Farias, M. Sharif, N. Qadeer, and H. U. Khan, “An Optimized Feature Selection Technique in Diversified Natural Scene Text for Classification Using Genetic Algorithm,” IEEE Access, vol. 9, pp. 54923–54937, 2021, doi: 10.1109/ACCESS.2021.3071169.

A. K. Mandal, Md. Nadim, H. Saha, T. Sultana, Md. D. Hossain, and E.-N. Huh, “Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search,” IEEE Access, vol. 12, pp. 62341–62357, 2024, doi: 10.1109/ACCESS.2024.3390684.

L. Al-Shalabi, “New Feature Selection Algorithm Based on Feature Stability and Correlation,” IEEE Access, vol. 10, pp. 4699–4713, 2022, doi: 10.1109/ACCESS.2022.3140209.

F. Feng, K.-C. Li, J. Shen, Q. Zhou, and X. Yang, “Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification,” IEEE Access, vol. 8, pp. 69979–69996, 2020, doi: 10.1109/ACCESS.2020.2987364.

S. Rahman and K. Adhikari, “Comparative Analysis of SVM and CNN for Sonar Signal Classification Using Sparse Arrays,” IEEE Access, vol. 12, pp. 59818–59830, 2024, doi: 10.1109/ACCESS.2024.3393893.

R. Obiedat et al., “Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution,” IEEE Access, vol. 10, pp. 22260–22273, 2022, doi: 10.1109/ACCESS.2022.3149482.

R. Guo, Z. Zhao, T. Wang, G. Liu, J. Zhao, and D. Gao, “Degradation State Recognition of Piston Pump Based on ICEEMDAN and XGBoost,” Appl. Sci., vol. 10, no. 18, p. 6593, Sep. 2020, doi: 10.3390/app10186593.

S. Naiem, A. E. Khedr, A. M. Idrees, and M. I. Marie, “Enhancing the Efficiency of Gaussian Naïve Bayes Machine Learning Classifier in the Detection of DDOS in Cloud Computing,” IEEE Access, vol. 11, pp. 124597–124608, 2023, doi: 10.1109/ACCESS.2023.3328951.

N. Shrestha, “Detecting Multicollinearity in Regression Analysis,” Am. J. Appl. Math. Stat., vol. 8, no. 2, pp. 39–42, Jun. 2020, doi: 10.12691/ajams-8-2-1.

F. Al Anshory, S. Siswanto, S. A. Thamrin, and I. Inayah, “Improved Chi Square Automatic Interaction Detection on Students Discontinuation to Secondary School,” J. Varian, vol. 7, no. 1, pp. 15–26, Oct. 2023, doi: 10.30812/varian.v7i1.2627.

Z. S. Rubaidi, B. B. Ammar, and M. B. Aouicha, “Fraud Detection Using Large-scale Imbalance Dataset,” Int. J. Artif. Intell. Tools, vol. 31, no. 08, p. 2250037, Dec. 2022, doi: 10.1142/S0218213022500373.




DOI: http://dx.doi.org/10.22441/fifo.2025.v17i2.001

Refbacks

  • There are currently no refbacks.



Jurnal Ilmiah FIFO

Fakultas Ilmu Komputer Universitas Mercu Buana
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Tlp./Fax: +62215871335
p-ISSN: 2085-4315
e-ISSN: 2502-8332
http://publikasi.mercubuana.ac.id/index.php/fifo

e-mail:[email protected]

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Web Analytics Made Easy - StatCounter

View My Stats