Komparasi Algoritma Topic Modelling LDA VS LSA Pada Berita Detikcom

Ahmad Kemal Al Izzi

Abstract


This research focuses on the process of applying Topic Modeling by comparing the Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) models on news tweet data taken from the Detikcom account. The process begins by crawling data over a one year period, starting from December 9, 2022 to December 9, 2023, resulting in 958 rows of data. Data pre-processing includes steps such as case folding, tokenization, stopwords removal, and stemming. After pre-processing, a bag of words process is carried out to calculate the frequency of word occurrences in each document. The number of word occurrence frequencies is used as a reference in creating LSA and LDA models. Each model has 8 topics, 10 iterations, and 42 random states. Topic production is carried out based on keywords that appear in the modeling results. Evaluation of the two models is carried out by measuring topic coherence or topic coherence using the c_v value. The LSA model shows a coherence value of 0.5, while the LDA model has a coherence value of 0.45. The evaluation results show that in this case, the LSA model has better performance than the LDA model based on the topic coherence value. As a suggestion for further research, researchers are expected to consider the use of other cases for topic modeling and other exploration models in Topic Modeling such as OCTIS. This can expand understanding of the performance of the Topic Modeling algorithm on X news data.


Keywords


Topic Modelling, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Detikcom, Topic Coherence

Full Text:

PDF

References


A. N. Ulfah and M. K. Anam, “Analisis Sentimen Hate Speech Pada Portal Berita Online Menggunakan Support Vector Machine (SVM),” vol. 7, no. 1, pp. 1–10, 2020, [Online]. Available: http://jurnal.mdp.ac.id

C. Naury, D. H. Fudholi, and A. F. Hidayatullah, “Topic Modelling pada Sentimen Terhadap Headline Berita Online Berbahasa Indonesia Menggunakan LDA dan LSTM,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 1, p. 24, Jan. 2021, doi: 10.30865/mib.v5i1.2556.

D. M. Wonohadidjojo, “Perbandingan Convolutional Neural Network pada Transfer Learning Method untuk Mengklasifikasikan Sel Darah Putih,” Ultimatics : Jurnal Teknik Informatika, vol. 13, no. 1, p. 51, 2021.

A. P. Giovani, A. Ardiansyah, T. Haryanti, L. Kurniawati, and W. Gata, “ANALISIS SENTIMEN APLIKASI RUANG GURU DI TWITTER MENGGUNAKAN ALGORITMA KLASIFIKASI,” Jurnal Teknoinfo, vol. 14, no. 2, p. 115, Jul. 2020, doi: 10.33365/jti.v14i2.679.

J. Budiarto, “Identifikasi Kebutuhan Masyarakat Nusa Tenggara Barat pada Pandemi Covid-19 di Media Sosial dengan Metode Crawling (Requirements Identification for NTB People in pandemic covid-19 at Social Media Using Crawling Method),” vol. 2, no. 4, pp. 244–250, 2021.

I. N. Husada, E. H. Fernando, H. Sagala, A. E. Budiman, and H. Toba, “Ekstraksi dan Analisis Produk di Marketplace Secara Otomatis dengan Memanfaatkan Teknologi Web Crawling,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 5, no. 3, Jan. 2020, doi: 10.28932/jutisi.v5i3.1977.

M. Dwirizqy Wimbassa, T. Marsyah Noor, S. Yasara, and T. Muhammad Arsyah, “Emotional Text Detection dengan Long Short Term Memory (LSTM),” Jurnal Format, vol. 12, 2023.

B. Gunawan, H. P. Sasty, and E. P. Esyudha, “Sistem Analisis Sentimen pada Ulasan Produk Menggunakan Metode Naive Bayes,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 4, no. 2, pp. 17–29, 2018, [Online]. Available: www.femaledaily.com

Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19 Menggunakan Metode Naïve Bayes,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 1, pp. 157–163, Jan. 2021, doi: 10.30865/mib.v5i1.2604.

D. Alita and A. Rahman, “Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier,” 2020.

M. Fiqri and R. Setya Perdana, “Klasifikasi Data Twitter pada Masa Transisi Pandemi menuju Endemi menggunakan Latent Semantic Analysis (LSA),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 6, pp. 2736–2742, 2023, [Online]. Available: http://j-ptiik.ub.ac.id

N. Hendrastuty, A. Rahman Isnain, and A. Yanti Rahmadhani, “Analisis Sentimen Masyarakat Terhadap Program Kartu Prakerja Pada Twitter Dengan Metode Support Vector Machine,” Jurnal Informatika: Jurnal pengembangan IT (JPIT), vol. 6, no. 3, 2021, [Online]. Available: http://situs.com

R. Farhan, R. Pohan, D. E. Ratnawati, and I. Arwani, “Implementasi Algoritma Support Vector Machine dan Model Bag-of-Words dalam Analisis Sentimen mengenai PILKADA 2020 pada Pengguna Twitter,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 6, no. 10, pp. 4924–4931, 2022, [Online]. Available: http://j-ptiik.ub.ac.id

I. Noor Kabiru and P. Kencana Sari, “ANALISA KONTEN MEDIA SOSIAL E-COMMERCE PADA INSTAGRAM MENGGUNAKAN METODE SENTIMEN ANALYSIS DAN LDA-BASED TOPIC MODELING (STUDI KASUS: SHOPEE INDONESIA) ANALYSIS OF CONTENT SOCIAL MEDIA E-COMMERCE IN INSTAGRAM USING SENTIMENT ANALYSIS AND LDA BASED TOPIC MODELING (STUDY CASE : SHOPEE INDONESIA),” e-Proceeding of Management, vol. 6, no. 1, p. 12, 2019.

M. H. Ababil and G. J. B. Setiawan, “Topic Modelling pada Ulasan Game Online Wildrift Menggunakan Latent Dirichlet Allocation (LDA),” Jurnal Pendidikan dan Konseling, vol. 4, no. 6, 2022.

F. Rashif, G. Ihza Perwira Nirvana, M. Alif Noor, and N. Aini Rakhmawati, “Implementasi LDA untuk Pengelompokan Topik Cuitan Akun Bot Twitter bertagar #Covid-19 LDA Implementation for Topic of Bot’s Tweets with #Covid-19 Hashtag,” Cogito Smart Journal |, vol. 7, no. 1, 2021.

K. Rinartha, L. Gede, and S. Kartika, “Penerapan LSA dan Query Suggestion untuk Pencarian Judul Artikel Menggunakan Framework FLASK LSA and Query Suggestion for Article Searching with FLASK Framework,” Cogito Smart Journal |, vol. 8, no. 1, 2022.

H. Jayadianti, R. Damayanti, and Juwairiah, “LATENT SEMANTIC ANALYSIS (LSA) DAN AUTOMATIC TEXT SUMMARIZATION (ATS) DALAM OPTIMASI PENCARIAN ARTIKEL COVID 19,” in Seminar Nasional Informatika 2020 (SEMNASIF 2020), 2020.

E. H. Fernando and H. Toba, “Pemanfaatan Latent Semantic Indexing untuk Mengukur Potensi Kerjasama Jurnal Ilmiah Lintas Universitas,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 6, no. 3, Dec. 2020, doi: 10.28932/jutisi.v6i3.2894.

Dinda Adimanggala, Fitra Abdurrachman Bachtiar, and Eko Setiawan, “Evaluasi Topik Tersembunyi Berdasarkan Aspect Extraction menggunakan Pengembangan Latent Dirichlet Allocation,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 3, pp. 511–519, Jun. 2021, doi: 10.29207/resti.v5i3.3075.

S. Kasau, S. Syarif, and S. Handayani Makassar, “TEXT MINING IN TWITTER: AN ANALYSIS AND MONITORING POLITICAL ISSUES,” semanTIK, vol. 7, no. 1, pp. 1–5, 2021, doi: 10.5281/zenodo.5036154.




DOI: http://dx.doi.org/10.22441/format.2024.v13.i1.005

DOI (PDF): http://dx.doi.org/10.22441/format.2024.v13.i1.005

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Format : Jurnal Ilmiah Teknik Informatika

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Format : Jurnal Ilmiah Teknik Informatika
Fakultas Ilmu Komputer Universitas Mercu Buana
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Tlp./Fax: +62215840816
http://publikasi.mercubuana.ac.id/index.php/format

p-ISSN: 2089-5615
e-ISSN: 2722-7162

 Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.

View My Stats