Performance of speech enhancement models in video conferences: DeepFilterNet3 and RNNoise
Abstract
As remote work and online education continue to gain prominence, the importance of clear audio communication becomes crucial. Deep Learning-based Speech Enhancement has emerged as a promising solution for processing data in noisy environments. In this study, we conducted an in-depth analysis of two speech enhancement models, RNNoise and DeepFilterNet3, selected for their respective strengths. DeepFilterNet3 leverages time-frequency masking with a Complex Mask filter, while RNNoise employs Recurrent Neural Networks with lower complexity. The performance evaluation in training revealed that RNNoise demonstrated impressive denoising capabilities, achieving low loss values, while DeepFilterNet3 showed superior generalization. Specifically, "DeepFilterNet3 (Pre-Trained)" exhibited the best overall performance, excelling in intelligibility and speech quality. RNNoise also performed well in subjective quality measures. Furthermore, we assessed the real-time processing efficiency of both models. Both RNNoise variants processed speech signals almost in real-time, whereas DeepFilterNet3, though slightly slower, remained efficient. The findings demonstrate significant improvements in speech quality, with "DeepFilterNet3 (Pre-Trained)" emerging as the top-performing model. The implications of this study have the potential to enhance video conference experiences and contribute to the improvement of remote work and online education.
Keywords
Refbacks
- There are currently no refbacks.
SINERGI
Published by:
Fakultas Teknik Universitas Mercu Buana
Jl. Raya Meruya Selatan, Kembangan, Jakarta 11650
Tlp./Fax: +62215871335
p-ISSN: 1410-2331
e-ISSN: 2460-1217
Journal URL: http://publikasi.mercubuana.ac.id/index.php/sinergi
Journal DOI: 10.22441/sinergi

Journal by SINERGI is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
The Journal is Indexed and Journal List Title by: