A Data Science Approach to Cancer Patient Classification Using Support Vector Machine and Random Forest

Devi Dwi Anggraini, Mutiara Rizky Salsabila, Keisya Rizkia Kamila

Abstract


The increasing availability of healthcare data has encouraged the application of data science and machine learning techniques in medical research. Cancer patient datasets contain numerical demographic and clinical attributes that can be utilized for classification tasks; however, complex feature relationships and limited feature relevance remain key challenges. This study aims to analyze cancer patient data and compare the performance of Support Vector Machine and Random Forest algorithms for gender classification. The dataset used in this study consists of numerical features, including patient age, tumor size, number of examined lymph nodes, number of positive lymph nodes, body mass index, and survival duration measured in months. The research methodology includes data preprocessing, exploratory data analysis, model development, and performance evaluation. Feature normalization and data splitting are applied to ensure a fair comparison between models, while exploratory analysis is conducted to examine data distribution and relationships among variables. Both classification models are trained under identical experimental settings and evaluated using accuracy as the primary performance metric. The results indicate that both algorithms are capable of classifying cancer patient gender with satisfactory accuracy. Support Vector Machine demonstrates slightly better performance compared to Random Forest, suggesting its effectiveness in handling numerical data with complex decision boundaries. The findings highlight the importance of appropriate algorithm selection and feature utilization in healthcare data analysis.

Keywords


Cancer patient data; Support Vector Machine; Random Forest; Data science;

References


I. Putri, R. Sari, and D. Prakoso, “Application of data

mining using multiple linear regression algorithm in gold

price forecasting,” Journal of Information Systems, vol. 6,

no. 1, pp. 25–32, 2020.

M. Rahman, A. Nugroho, and S. Hadi, “Sentiment

analysis of public opinion on public transportation in

Jabodetabek using a web-based SVM algorithm,” Journal

of Information Technology, vol. 8, no. 2, pp. 30–36, 2020.

D. Sari, F. Ananda, and Y. Pratama, “Sentiment

analysis of tweets on the omnibus law using PSO-based

SVM algorithm,” Journal of Data Science and Analytics,

vol. 5, no. 1, pp. 40–46, 2021.

A. Hidayat, R. Maulana, and N. Fitriani, “Sentiment

analysis of TikTok Shop users using the SVM algorithm,”

Journal of Digital Business Analytics, vol. 4, no. 2, pp.

–29, 2022.

D. Prasetyo, L. Wibowo, and A. Kurniawan,

“Classification of public opinion on Twitter regarding data

breaches in Indonesia using the SVM algorithm,” Journal

of Social Media Analytics, vol. 6, no. 1, pp. 35–41, 2021.

N. Utami and A. Saputra, “Implementation of support

vector machine algorithm in predicting stroke disease,”

Journal of Health Informatics, vol. 4, no. 1, pp. 44–49,

M. Ramadhan, I. Hanafiah, and L. Safitri, “The effect

of data balancing techniques on NAFLD disease

classification using SVM algorithm,” Journal of

Biomedical Informatics, vol. 6, no. 3, pp. 51–57, 2021.

A. Basri, H. Nasir, and L. Andini, “Disease diagnosis

analysis based on medical history using random forest

algorithm: A case study at Padjongadg Ngalle Hospital,

Takalar Regency,” Journal of Medical Informatics, vol. 7,

no. 2, pp. 32–38, 2020.

R. Pratama and S. Lestari, “Prediction of thyroid

cancer recurrence using random forest algorithm,” Journal

of Biomedical Data Science, vol. 4, no. 2, pp. 40–47, 2021.

M. Santoso and D. Kurnia, “Skin cancer image

classification using random forest,” Journal of Computer

Vision and Imaging, vol. 4, no. 2, pp. 29–35, 2022.

T. Wibowo and A. Hakim, “Intelligent detection and

prediction of lung diseases using random forest

algorithm,” Journal of Intelligent Systems, vol. 5, no. 1,

pp. 37–43, 2021.

S. Lestari, R. Handayani, and M. Putra, “Optimization

of random forest algorithm using particle swarm

optimization for breast cancer classification with

mammogram images,” Journal of Medical Image

F. Firdaus, Y. Putra, and N. Siregar, “Characteristics

of lung cancer patients at Dr. M. Djamil General Hospital

Padang in 2021,” Journal of Clinical Oncology Research,

vol. 9, no. 1, pp. 20–27, 2021.

A. Hakim, R. Maulana, and S. Hidayah, “Lung cancer

classification using a comparison of machine learning

algorithms,” Journal of Health Artificial Intelligence, vol.

, no. 2, pp. 44–50, 2022.

S. Sulastri and D. Permata, “Comparative analysis of

breast cancer prediction accuracy using random forest and

logistic regression,” Journal of Health Data Science, vol.

, no. 2, pp. 50–56, 2021.

E. Mulyani and P. Rahayu, “Breast cancer

classification using SVM with RBF, linear, and sigmoid

kernels,” Journal of Machine Learning Applications, vol.

, no. 3, pp. 39–45, 2020.




DOI: http://dx.doi.org/10.22441/collabits.v3i1.37642

Refbacks

  • There are currently no refbacks.


Journal Collabits
Portal ISSNPrint ISSN: 3062-8601
Online ISSN: 3046-6709

Sekretariat
Fakultas Ilmu Komputer
Universitas Mercu Buana
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Tlp./Fax: +62215871335

http://publikasi.mercubuana.ac.id/index.php/collabits

e-mail: [email protected]

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.