Benchmarking YOLOv8 and vision transformers for intelligent fish monitoring in aquaponics and controlled aquarium environments

Tresna Dewi; Yurni Oktarina; Sri Rezki Artini; Gita Ayu Julianka; Jhoni Satria

doi:10.22441/sinergi.2026.2.022

Authors

Tresna Dewi Department of Electrical Engineering, Politeknik Negeri Sriwijaya, Indonesia
Yurni Oktarina Department of Electrical Engineering, Politeknik Negeri Sriwijaya, Indonesia
Sri Rezki Artini Department of Civil Engineering, Politeknik Negeri Sriwijaya, Indonesia
Gita Ayu Julianka Department of Electrical Engineering, Politeknik Negeri Sriwijaya, Indonesia
Jhoni Satria Koi Agro Farm, Indonesia

DOI:

https://doi.org/10.22441/sinergi.2026.2.022

Keywords:

Aquaponics, Deep Learning, Fish Detection, Vision Transformer, YOLOv8

Abstract

Sustainable aquaculture requires reliable and accurate fish monitoring systems capable of operating across heterogeneous environmental conditions. Conventional monitoring approaches are labor-intensive and prone to human error, while recent advances in deep learning have enabled vision-based automation for aquatic environments. Convolutional object detectors such as YOLO and emerging Vision Transformer (ViT) models have demonstrated promising performance; however, most existing studies remain limited to single-environment evaluations and rarely address energy-constrained, real-world deployment. To bridge this gap, this study presents a systematic benchmark of YOLOv8 and ViT across two complementary settings: a controlled aquarium environment and a solar-powered, off-grid aquaponics system. The proposed framework integrates 1080p CCTV video acquisition, dataset annotation and augmentation, and standardized training and evaluation using COCO metrics. Experimental results show that ViT consistently outperforms YOLOv8 in detection accuracy and prediction stability across both environments. ViT achieves 99.73% accuracy in the controlled aquarium and ≥99.6% accuracy performance (99.68–99.73%) in aquaponics, while YOLOv8 records 87.90% accuracy in the aquarium and 93.92–97.92% across aquaponics fish classes, exhibiting higher sensitivity to background clutter. Statistical validation using McNemar’s test (p < 0.001) confirms that these differences are statistically significant. Beyond accuracy, the results reveal a trade-off between robustness and computational efficiency. ViT provides superior resilience under occlusion and glare, whereas YOLOv8 offers faster inference suitable for real-time operation on resource-limited edge devices. End-to-end deployment on a solar-powered NVIDIA Jetson Xavier NX demonstrates the feasibility of continuous, off-grid aquaculture monitoring and provides practical guidance for context-aware model selection in intelligent aquaculture systems.