A Data Science Approach to Cancer Patient Classification Using Support Vector Machine and Random Forest

Devi Dwi Anggraini, Mutiara Rizky Salsabila, Keisya Rizkia Kamila, Yunita Sartika Sari

Abstract


The increasing availability of healthcare data has encouraged the application of data science and machine learning techniques in medical research. Cancer patient datasets contain numerical demographic and clinical attributes that can be utilized for classification tasks; however, complex feature relationships and limited feature relevance remain key challenges. This study aims to analyze cancer patient data and compare the performance of Support Vector Machine and Random Forest algorithms for gender classification. The dataset used in this study consists of numerical features, including patient age, tumor size, number of examined lymph nodes, number of positive lymph nodes, body mass index, and survival duration measured in months. The research methodology includes data preprocessing, exploratory data analysis, model development, and performance evaluation. Feature normalization and data splitting are applied to ensure a fair comparison between models, while exploratory analysis is conducted to examine data distribution and relationships among variables. Both classification models are trained under identical experimental settings and evaluated using accuracy as the primary performance metric. The results indicate that both algorithms can classify cancer patients with satisfactory accuracy. Support Vector Machine demonstrates slightly better performance compared to Random Forest, suggesting its effectiveness in handling numerical data with complex decision boundaries. The findings highlight the importance of appropriate algorithm selection and feature utilization in healthcare data analysis.

Keywords


Cancer Patient Data; Support Vector Machine; Random Forest; Data science

Full Text:

PDF

References


I. Putri, R. Sari, and D. Prakoso, “Application of data mining using multiple linear regression algorithm in gold price forecasting,” Journal of Information Systems, vol. 6, no. 1, pp. 25–32, 2020.

M. Rahman, A. Nugroho, and S. Hadi, “Sentiment analysis of public opinion on public transportation in Jabodetabek using a web-based SVM algorithm,” Journal of Information Technology, vol. 8, no. 2, pp. 30–36, 2020.

D. Sari, F. Ananda, and Y. Pratama, “Sentiment analysis of tweets on the omnibus law using PSO-based SVM algorithm,” Journal of Data Science and Analytics, vol. 5, no. 1, pp. 40–46, 2021.

A. Hidayat, R. Maulana, and N. Fitriani, “Sentiment analysis of TikTok Shop users using the SVM algorithm,” Journal of Digital Business Analytics, vol. 4, no. 2, pp. 23–29, 2022.

D. Prasetyo, L. Wibowo, and A. Kurniawan, “Classification of public opinion on Twitter regarding data breaches in Indonesia using the SVM algorithm,” Journal of Social Media Analytics, vol. 6, no. 1, pp. 35–41, 2021.

N. Utami and A. Saputra, “Implementation of support vector machine algorithm in predicting stroke disease,” Journal of Health Informatics, vol. 4, no. 1, pp. 44–49, 2020.

M. Ramadhan, I. Hanafiah, and L. Safitri, “The effect of data balancing techniques on NAFLD disease classification using SVM algorithm,” Journal of Biomedical Informatics, vol. 6, no. 3, pp. 51–57, 2021.

A. Basri, H. Nasir, and L. Andini, “Disease diagnosis analysis based on medical history using random forest algorithm: A case study at Padjongadg Ngalle Hospital, Takalar Regency,” Journal of Medical Informatics, vol. 7, no. 2, pp. 32–38, 2020.

R. Pratama and S. Lestari, “Prediction of thyroid cancer recurrence using random forest algorithm,” Journal of Biomedical Data Science, vol. 4, no. 2, pp. 40–47, 2021.

M. Santoso and D. Kurnia, “Skin cancer image classification using random forest,” Journal of Computer Vision and Imaging, vol. 4, no. 2, pp. 29–35, 2022.

T. Wibowo and A. Hakim, “Intelligent detection and prediction of lung diseases using random forest algorithm,” Journal of Intelligent Systems, vol. 5, no. 1, pp. 37–43, 2021.

S. Lestari, R. Handayani, and M. Putra, “Optimization of random forest algorithm using particle swarm optimization for breast cancer classification with mammogram images,” Journal of Medical Image Computing, vol. 6, no. 2, pp. 45–52, 2022.

F. Firdaus, Y. Putra, and N. Siregar, “Characteristics of lung cancer patients at Dr. M. Djamil General Hospital Padang in 2021,” Journal of Clinical Oncology Research, vol. 9, no. 1, pp. 20–27, 2021.

A. Hakim, R. Maulana, and S. Hidayah, “Lung cancer classification using a comparison of machine learning algorithms,” Journal of Health Artificial Intelligence, vol. 6, no. 2, pp. 44–50, 2022.

S. Sulastri and D. Permata, “Comparative analysis of breast cancer prediction accuracy using random forest and logistic regression,” Journal of Health Data Science, vol. 5, no. 2, pp. 50–56, 2021.

E. Mulyani and P. Rahayu, “Breast cancer classification using SVM with RBF, linear, and sigmoid kernels,” Journal of Machine Learning Applications, vol. 4, no. 3, pp. 39–45, 2020.




DOI: http://dx.doi.org/10.22441/collabits.v3i1.37642

Refbacks

  • There are currently no refbacks.


Journal Collabits
Portal ISSNPrint ISSN: 3062-8601
Online ISSN: 3046-6709

Sekretariat
Fakultas Ilmu Komputer
Universitas Mercu Buana
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Tlp./Fax: +62215871335

http://publikasi.mercubuana.ac.id/index.php/collabits

e-mail: [email protected]

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.