An effective and efficient vehicle detection using ER-EMA-YOLOv10n

Imanuel Kutika; Vicky Nolant Setyanto Lahimade; Tomi Heri Julius Todingan; Hebron Prasetya; Steven Ray Sentinuwo; Muhamad Dwisnanto Putro

doi:10.22441/sinergi.2026.1.017

Authors

Imanuel Kutika Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia
Vicky Nolant Setyanto Lahimade Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia
Tomi Heri Julius Todingan Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia
Hebron Prasetya Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia
Steven Ray Sentinuwo Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia
Muhamad Dwisnanto Putro Master Program of Informatics, Postgraduate Program, Sam Ratulangi University, Indonesia

DOI:

https://doi.org/10.22441/sinergi.2026.1.017

Keywords:

Deep Learning, Efficient Model, Vehicle Detection, YOLOv10,

Abstract

Vehicle detection plays a key role in automating traffic analysis, a field that continues to advance rapidly. Vision-based systems identify vehicle types and sizes, but achieving high accuracy and efficiency remains a challenge. Reliable real-world deployment requires optimized models that balance performance and computational cost. YOLOv10n, the most efficient version of the YOLO family, offers a solid foundation for lightweight feature extraction. To improve its detection performance, this study proposes an enhanced version of YOLOv10n by incorporating a scale-aware attention mechanism. We proposed the Expanded Refinement Efficient Multi-Scale Attention (ER-EMA) module, which enhances feature encoding by capturing vehicle characteristics across multiple receptive fields. ER-EMA consists of two core components: the Expanded Converted Inverted Block (ECIB) and the Convolutional Refinement Block (CRB). These components use diverse convolutional kernels to extract and refine multi-frequency spatial features. Integrating ER-EMA into the YOLOv10n framework produces a more compact and accurate detection model. Experimental results show that the proposed model increases mAP@50 by 1%, while reducing the number of parameters by 0.1M and computation by 0.1 GFLOPS on the Vehicle-COCO dataset. On the UA-DETRAC benchmark, it achieves a 4% improvement in mAP@50:95, with a reduction of 0.2M in parameters and 0.4 GFLOPS in computational efficiency—outperforming the original YOLOv10n and prior methods in both performance and computational efficiency.