Deteksi Kerusakan Jalan Menggunakan Vision Transformer Berbasis Citra Digital
Keywords:
Vision Transformer, Image Classification, Road Damage, Deep Learning, Computer VisionAbstract
Road damage, such as potholes and cracks, is an infrastructure problem that can increase the risk of accidents and reduce road user comfort. Conventional road inspection methods are still manual, subjective, and inefficient. This study aims to implement and deploy Vision Transformer (ViT) as an automatic image-based road damage classification method. The dataset used consists of 5,444 road condition images divided into three classes: no damage, potholes, and cracks, with an unbalanced data distribution. All images were preprocessed, consisting of a uniform size of 128x128 pixels and pixel value normalization. A lightweight version of the Vision Transformer model was built and tested using Google Colab, despite limited computing resources. Test results show that the model achieved an accuracy of ±89.7%, with the best performance in the no damage and pothole classes. However, performance in the crack class was still relatively low due to the limited data volume and the small visual characteristics of cracks. The results indicate that Vision Transformer has good potential as an automated solution for monitoring road conditions, although further development is needed to improve performance in minority classes.
References
Arif, M. F., Nurkholis, A., Laia, S., & Rosyani, P. (2023, Juni). Deteksi kendaraan dengan metode YOLO. Jurnal Artificial Intelligence dan Sistem Penunjang Keputusan, 1(1), 12–20. https://jaispk.org/index.php/jaispk/article/view/15
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
Firmansyah, A., Itsnan, A. F., Apip, A., Mulliya, R. T., & Rosyani, P. (2024, Desember). Sistem absensi mahasiswa menggunakan face recognition dengan algoritma CNN. AI dan SPK Jurnal Artificial Intelligence dan Sistem Penunjang Keputusan, 1(4), 45–56. https://doi.org/10.47065/aispk.v1i4.1234
Huyan, J., Li, W., Tighe, S., Xu, Z., & Zhai, J. (2020). CrackU-Net: A novel deep convolutional neural network for pixelwise pavement crack detection. Structural Control and Health Monitoring, 27(8), e2551. https://doi.org/10.1002/stc.2551
Jonathan, M., Hafidz, M. T., Apriyanti, N. A., Husaini, Z., & Rosyani, P. (2023, Juni). Mendeteksi plat nomor kendaraan dengan metode YOLO (you only look once) dan single shot detector (SSD). AI dan SPK Jurnal Artificial Intelligence dan Sistem Penunjang Keputusan, 1(1), 67–75.
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2021, September). Transformers in vision: A survey. ACM Computing Surveys, 54(10s), Article 200. https://doi.org/10.1145/3505244
Li, S., Zhao, X., & Zhou, G. (2022). Automatic pavement crack detection by multi-scale image fusion. IEEE Transactions on Intelligent Transportation Systems, 23(10), 18189–18201. https://doi.org/10.1109/TITS.2021.3127639
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
ProgrammerRdai. (2025). Road Issues Detection Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/programmerrdai/road-issues-detection-dataset
Saprudin, Rosyani, P., & Amalia, R. (2021). Klasifikasi citra menggunakan metode random forest dan sequential minimal optimization (SMO). JUSTIN (Jurnal Sistem dan Teknologi Informasi), 9(2), 132–134. https://jurnal.stmik-mi.ac.id/index.php/jstmi/article/view/410












