References

Gueddes, N. B. (1940). Magic Motorways. Random House.
Brown, M. (2011). Racecar - Searching for the Limit in Formula SAE. Seven Car.
Morgado, D. (2021). A Perception Pipeline for an Autonomous Formula Student Vehicle [MSc Thesis in Mechanical Engineering]. Universidade de Lisboa - Instituto Superior Técnico.
Gomes, D. R., Botto, M. A., & Lima, P. U. (2024). Learning-based Model Predictive Control for an Autonomous Formula Student Racing Car [Master's thesis, Universidade de Lisboa - Instituto Superior Técnico]. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 12556–12562). https://ieeexplore.ieee.org/abstract/document/10611285
Jose, C. P. (2016). A review on the trends and developments in hybrid electric vehicle. Innovative Design and Development Practices in Aerospace and Automotive Engineering: I-DAD, 211–229. https://link.springer.com/chapter/10.1007/978-981-10-1771-1_25
Stanchev, P., & Geske, J. (2016). Autonomous Cars. History. State of Art. Research Problems. DCCN 2015, 1–10. https://link.springer.com/chapter/10.1007/978-3-319-30843-2_1
Aggarwal, I. (2022). Rise of Autonomous Vehicles. International Journal of Social Science and Economic Research, 7(10). https://ijsser.org/2022files/ijsser_07__229.pdf
Brummelen, J. V., O’Brien, M., Gruyerb, D., & Najjaran, H. (2018). Autonomous vehicle perception: The technology of today and tomorrow. Transportation Research: Part C, 89: 384–406. https://www.sciencedirect.com/science/article/pii/S0968090X18302134
of Automotive Engineers, S. (2022). Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. J3016_202104. https://www.sae.org/standards/content/j3016_202104/
Betz, J., Zheng, H., Liniger, A., Rosolia, U., Karle, P., Behl, M., Krovi, V., & Mangharam, R. (2022). Autonomous Vehicles on the Edge: A Survey on Autonomous Vehicle Racing. IEEE Open Journal of Intelligent Transportation Systems, 3: 458–488. https://ieeexplore.ieee.org/abstract/document/9790832
Fayyad, J., Jaradat, M. A., Gruyer, D., & Najjaranngharam, H. (2022). Deep Learning Sensor Fusion: Vehicle Perception and Localization: A Review. Sensors, 20. https://www.mdpi.com/1424-8220/20/15/4220
Jeffs, J., & He, M. X. (2023). Autonomous Cars, Robotaxis and Sensors 2024-2044. IDTechEx. https://www.idtechex.com/en/research-report/autonomous-cars-robotaxis-and-sensors-2024-2044/953
Ackerman, E. (2021). What Full Autonomy Means for the Waymo Driver. IEE Spectrum. https://spectrum.ieee.org/full-autonomy-waymo-driver
Betz, J., & al., E. (2019). What can we learn from autonomous level-5 motorsport? Springer. https://link.springer.com/content/pdf/10.1007/978-3-658-22050-1_12.pdf
Barrachina, J., & al., E. (2013). V2X-d: A vehicular density estimation system that combines V2V and V2I communications. IFIP Wireless Days (WD). https://ieeexplore.ieee.org/document/6686518
Dhall, A., Dai, D., & Gool, L. V. (2019). Real-time 3D Traffic Cone Detection for Autonomous Driving. IEEE Intelligent Vehicles Symposium. https://ieeexplore.ieee.org/document/8814089/
Wen, L., & Jo, K. (2022). Deep learning-based perception systems for autonomous driving: A comprehensive survey. Neuralcomputing. https://doi.org/10.1016/j.neucom.2021.08.155
Rusu, R. B., & Cousins, S. (2011). 3d is here: Point Cloud Library (PCL). IEEE International Conference on Robotics And Automation. https://ieeexplore.ieee.org/document/5980567
Nguyen, A., & Jo, K. (2013). 3D Point Cloud Segmentation: A survey. IEEE Conference on Robotics, Automation and Mechatronics. https://ieeexplore.ieee.org/document/6758588
Liu, W., Sun, J., Li, W., Hu, T., & Wang, P. (2019). Deep Learning on Point Clouds and Its Application: A Survey. Sensors. https://www.mdpi.com/1424-8220/19/19/4188
Zhang, J., Zhao, X., & Lu, Z. (2019). A Review of Deep Learning-Based Semantic Segmentation for Point Cloud. IEEE Access. https://ieeexplore.ieee.org/abstract/document/8930503/
Grilli, E., Menna, F., & Remondino, F. (2017). A Review of Point Clouds Segmentation and Classification Algorithms. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences. https://isprs-archives.copernicus.org/articles/XLII-2-W3/339/2017/
Bhanu, B., Lee, S., Ho, C., & Henderson, T. (1986). Range data processing: Representation of surfaces by edges. Pattern Recognition. https://core.ac.uk/download/pdf/276277383.pdf
Sappa, A., & M.Devy. (2001). Fast range image segmentation by an edge detection strategy. 3D Digital Imaging and Modeling. https://ieeexplore.ieee.org/abstract/document/924460
Jiang, X. Y., Bunke, H., & Meier, U. (1996). Fast range image segmentation. Third IEEE. https://ieeexplore.ieee.org/document/572006/
Bello, S. A., & al, E. (2020). Review: Deep Learning on 3D Point Clouds. Remote Sensing. https://www.mdpi.com/2072-4292/12/11/1729
Cheri, A., & Mouftah, H. T. (2019). Autonomous vehicles in the sustainable cities, the beginning of a green adventure. Sustainable Cities and Society. https://doi.org/10.1016/j.scs.2019.101751
Srivastava, A. (2019). Sense-Plan-Act in Robotic Applications. DOI, 10: 3.
Dingus, T. D., & al, E. (2016). Driver crash risk factors and prevalence evaluation using driving data. Proceedings of the National Academy of Sciences. https://www.pnas.org/doi/abs/10.1073/pnas.1513271113
Gosala, N., & al., E. (2019). Redundant Perception and State Estimation for Reliable Autonomous Racing. 2019 International Conference on Robotics and Automation (ICRA). https://ieeexplore.ieee.org/document/8794155
Valls, M., & al., E. (2018). Design of an Autonomous Racecar: Perception, State Estimation and System Integration. 2018 IEEE ICRA. https://ieeexplore.ieee.org/document/8462829
Hudson, J., Orviska, M., & Hunady, J. (2019). People’s attitudes to autonomous vehicles. Transportation Research Part A: Policy and Practice, 121: 164–176. https://doi.org/10.1016/j.tra.2018.08.018
Betz, J., & al., E. (2023). Tum autonomous motorsport: An autonomous racing software for the indy autonomous challenge. Journal of Field Robotics, 40(4), 783–809. https://doi.org/10.1002/rob.22153
Vödisch, N., Dodel, D., & Schötz, M. (2022). FSOCO: The Formula Student Objects in Context Dataset. SAE International Journal of Connected and Automated Vehicles, 5. https://arxiv.org/abs/2012.07139
LLC, W. (2021). On the road to fully self-driving. Waymo Safety Report (pp. 1–48).
O’Kelly, M., Zheng, H., Karthik, D., & Mangharam, R. (2020). F1TENTH: An Open-source Evaluation Environment for Continuous Control and Reinforcement Learning. Proceedings of ML Research, 123. https://par.nsf.gov/biblio/10221872
Yurtsever, E., Lambert, J., Carballo, A., & Takeda, K. (2020). A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, 8, 58443–58469. https://ieeexplore.ieee.org/abstract/document/9046805
Hulse, L. M., Xie, H., & Galea, E. R. (2018). Relationships with road users, risk, gender and age. Safety Science, 102, 1–13. https://www.sciencedirect.com/science/article/pii/S0925753517306999
Montgomery, W., & al., E. (2018). Realizing productivity gains and spurring economic growth. America’s Workforce and the Self-Driving Future. https://avworkforce.secureenergy.org/
Singh, S. (2015). Critical reasons for crashes investigated in the national motor vehicle crash causation survey. National Highway Traffic Safety Administration. http://www-nrd.nhtsa.dot.gov/Pubs/812115.pdf
FSG. (2023). FS Rules 2024. https://www.formulastudent.de/fsg/rules/
FSG. (2023). FS Handbook 2024. https://www.formulastudent.de/fsg/rules/
Arnold, E., Al-Jarrah, O. Y., & al., E. (2019). A survey on 3d object detection methods for autonomous driving applications. IEEE Transactions on Intelligent Transportation Systems, 20(10), 3782–3795. https://ieeexplore.ieee.org/abstract/document/8621614
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237. https://journals.sagepub.com/doi/full/10.1177/0278364913491297
Liang, W., Xu, P., Guo, L., Bai, H., Zhou, Y., & Chen, F. (2021). A survey of 3D object detection. Multimedia Tools and Applications, 80(19), 29617–29641. https://link.springer.com/article/10.1007/s11042-021-11137-y
Qian, R., Lai, X., & Li, X. (2022). 3D object detection for autonomous driving: A survey. Pattern Recognition, 130, 108796. https://www.sciencedirect.com/science/article/pii/S0031320322002771
Mao, J., Shi, S., Wang, X., & Li, H. (2023). 3D object detection for autonomous driving: A comprehensive survey. International Journal of Computer Vision, 131(8), 1909–1963. https://link.springer.com/article/10.1007/s11263-023-01790-1
Wang, Y., Mao, Q., & al., E. (2023). Multi-modal 3d object detection in autonomous driving: a survey. International Journal of Computer Vision, 131(8), 2122–2152. https://link.springer.com/article/10.1007/s11263-023-01784-z
Nagiub, A. S., Fayez, M., Khaled, H., & Ghoniemy, S. (2024). 3D object detection for autonomous driving: a comprehensive review. 2024 6th International Conference on Computing and Informatics (ICCI), 01–11. https://ieeexplore.ieee.org/abstract/document/10485120
Calvo, E. L., Taveira, B., Kahl, F., Gustafsson, N., Larsson, J., & Tonderski, A. (2023). Timepillars: Temporally-recurrent 3d lidar object detection. ArXiv Preprint ArXiv:2312.17260. https://arxiv.org/abs/2312.17260
Xuan, Y., & Qu, Y. (2024). Multimodal Data Fusion for BEV Perception. Master Thesis. https://odr.chalmers.se/items/589548c4-f439-4c12-ac16-6d74884ec41b
Vaswani, A., & al., E. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Dosovitskiy, A., & al., E. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint ArXiv:2010.11929. https://arxiv.org/pdf/2010.11929/1000
Chi, C., Wei, F., & Hu, H. (2020). Relationnet++: Bridging visual representations for object detection via transformer decoder. Advances in Neural Information Processing Systems, 33, 13564–13574. https://arxiv.org/abs/2010.15831
Gao, W., & Li, G. (2025). Deep learning for 3D point clouds. Springer. https://link.springer.com/content/pdf/10.1007/978-981-97-9570-3.pdf
Alhardi, A., & Afeef, M. A. (2024). Object Detection Algorithms & Techniques. 4th International Conference on Innovative Academic Studies.
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning. MIT press Cambridge.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://dl.acm.org/doi/abs/10.1145/3065386
Moutinho, A. (2022). Computer Vision Slides. Instituto Superior Técnico.
Li, K., & Cao, L. (2020). A review of object detection techniques. 2020 5th International Conference on ICECTT, 385–390. https://ieeexplore.ieee.org/abstract/document/9237557
Ng, A. (2016). What artificial intelligence can and can’t do right now. Harvard Business Review, 9(11).
Turk, G., & Levoy, M. (1994). Zippered polygon meshes from range images. Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques. https://dl.acm.org/doi/abs/10.1145/192161.192241
Lin, H., Wang, L., Qu, X., & others. (2025). A High-Precision Calibration and Evaluation Method Based on Binocular Cameras and LiDAR for Intelligent Vehicles. IEEE Transactions on Vehicular Technology.
Zhang, H., & al., E. (2025). 3D LiDAR and monocular camera calibration: A Review. IEEE Sensors Journal. https://ieeexplore.ieee.org/abstract/document/10852582
Huch, H. C. S. (2025). LiDAR Domain Adaptation for Perception of Autonomous Vehicles [PhD thesis, Technische Universität München]. https://mediatum.ub.tum.de/1748697
Garcia, G. M., & al., E. (2025). Fine-tuning image-conditional diffusion models is easier than you think. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 753–762. https://arxiv.org/abs/2409.11355
Yang, L., Kang, B., & al., E. (2024). Depth anything: Unleashing the power of large-scale unlabeled data. Proceedings of the IEEE/CVF Conference on CVPR, 10371–10381. https://arxiv.org/abs/2401.10891
Yang, L., Kang, B., & al., E. (2024). Depth anything v2. Advances in Neural Information Processing Systems, 37, 21875–21911. https://arxiv.org/abs/2406.09414
Peris, M., Martull, S., Maki, A., Ohkawa, Y., & Fukui, K. (2012). Towards a simulation driven stereo vision system. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), 1038–1042. https://ieeexplore.ieee.org/abstract/document/6460313
Li, H., Zhao, Y., Zhong, J., Wang, B., Sun, C., & Sun, F. (2025). Delving into the Secrets of BEV 3D Object Detection in Autonomous Driving: A Comprehensive Survey. Authorea Preprints. https://www.techrxiv.org/doi/full/10.36227/techrxiv.173221675.59410416
Caesar, H., Bankiti, V., & al., E. (2020). nuscenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on CVPR, 11621–11631. https://arxiv.org/abs/1903.11027
Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., & others. (2020). Scalability in perception for autonomous driving: Waymo open dataset. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2446–2454. https://arxiv.org/abs/1912.04838
Zamanakos, G., Tsochatzidis, L., Amanatiadis, A., & Pratikakis, I. (2021). A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving. Computers & Graphics, 99, 153–181. https://www.sciencedirect.com/science/article/pii/S0097849321001321
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems.
Block, H.-D. (1962). The perceptron: A model for brain functioning. i. Reviews of Modern Physics, 34(1), 123.
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 652–660. https://arxiv.org/abs/1612.00593
Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.02413
Lai-Dang, Q.-V. (2024). A survey of vision transformers in autonomous driving: Current trends and future directions. ArXiv Preprint ArXiv:2403.07542. https://arxiv.org/abs/2403.07542
Chang, M.-F., & al., E. (2019). Argoverse: 3d tracking and forecasting with rich maps. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8748–8757. https://arxiv.org/abs/1911.02620
Patil, A., Malla, S., Gang, H., & Chen, Y.-T. (2019). The h3d dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. 2019 International Conference on Robotics and Automation (ICRA), 9552–9557. https://arxiv.org/abs/1903.01568
Houston, J., & al., E. (2021). One thousand and one hours: Self-driving motion prediction dataset. Conference on Robot Learning, 409–418. https://proceedings.mlr.press/v155/houston21a.html
Wang, P., & al., E. (2019). The apolloscape open dataset for autonomous driving and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1.
Kuznetsova, A., Rom, H., Alldrin, N., & others. (2020). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision, 128(7), 1956–1981. https://arxiv.org/abs/1811.00982z
Russakovsky, O., Deng, J., Su, H., & and others. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252. https://arxiv.org/abs/1409.0575
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338. https://link.springer.com/article/10.1007/S11263-009-0275-4
Lin, T.-Y., & al., E. (2014). Microsoft coco: Common objects in context. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13, 740–755. https://arxiv.org/abs/1405.0312
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, 746–760. https://link.springer.com/chapter/10.1007/978-3-642-33715-4_54
Pravallika, A., Hashmi, M. F., & Gupta, A. (2024). Deep Learning Frontiers in 3D Object Detection: A Comprehensive Review for Autonomous Driving. IEEE Access. https://ieeexplore.ieee.org/abstract/document/10670385/
Zhu, M., Gong, Y., Tian, C., & Zhu, Z. (2024). A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends. Drones, 8(8), 412. https://www.mdpi.com/2504-446X/8/8/412
Bhat, S. F., Alhashim, I., & Wonka, P. (2021). Adabins: Depth estimation using adaptive bins. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4009–4018. https://openaccess.thecvf.com/content/CVPR2021/html/Bhat_AdaBins_Depth_Estimation_Using_Adaptive_Bins_CVPR_2021_paper.html
He, X., & al., E. (2025). Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator. ArXiv Preprint ArXiv:2502.19204. https://arxiv.org/abs/2502.19204
Fu, H., & al., E. (2018). Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2002–2011. https://openaccess.thecvf.com/content_cvpr_2018/html/Fu_Deep_Ordinal_Regression_CVPR_2018_paper.html
Ranftl, R., & al., E. (2020). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1623–1637. https://ieeexplore.ieee.org/abstract/document/9178977
Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision, 12179–12188. https://openaccess.thecvf.com/content/ICCV2021/html/Ranftl_Vision_Transformers_for_Dense_Prediction_ICCV_2021_paper.html
Li, Z., Chen, Z., Liu, X., & Jiang, J. (2023). Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation. Machine Intelligence Research, 20(6), 837–854. https://link.springer.com/article/10.1007/s11633-023-1458-0]
Yin, W., & al., E. (2023). Metric3d: Towards zero-shot metric 3d prediction from a single image. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9043–9053. https://openaccess.thecvf.com/content/ICCV2023/html/Yin_Metric3D_Towards_Zero-shot_Metric_3D_Prediction_from_A_Single_Image_ICCV_2023_paper.html
Bhat, S. F., & al., E. (2023). Zoedepth: Zero-shot transfer by combining relative and metric depth. ArXiv Preprint ArXiv:2302.12288.
Ke, B., Obukhov, A., & al., E. (2024). Repurposing diffusion-based image generators for monocular depth estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9492–9502. https://openaccess.thecvf.com/content/CVPR2024/html/Ke_Repurposing_Diffusion-Based_Image_Generators_for_Monocular_Depth_Estimation_CVPR_2024_paper.html
Piccinelli, L., Yang, Y.-H., & al., E. (2024). UniDepth: Universal monocular metric depth estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10106–10116.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. ArXiv Preprint ArXiv:1312.6229. https://arxiv.org/abs/1312.6229
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on CVPR, 779–788. https://arxiv.org/abs/1506.02640
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21–37. https://arxiv.org/abs/1512.02325
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE CVPR, 580–587. https://arxiv.org/abs/1311.2524
Purkait, P., Zhao, C., & Zach, C. (2017). SPP-Net: Deep absolute pose regression with synthetic views. ArXiv Preprint ArXiv:1712.03452. https://arxiv.org/abs/1712.03452
Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, 1440–1448. https://arxiv.org/abs/1504.08083
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://arxiv.org/abs/1506.01497
Wu, X., Sahoo, D., & Hoi, S. C. H. (2020). Recent advances in deep learning for object detection. Neurocomputing, 396, 39–64. https://www.sciencedirect.com/science/article/pii/S0925231220301430
Pagire, V., Chavali, M., & Kale, A. (2025). A comprehensive review of object detection with traditional and dl methods. Signal Processing, 237, 110075. https://www.sciencedirect.com/science/article/pii/S0165168425001896
Zou, Z., Chen, K., Shi, Z., Guo, Y., & Ye, J. (2023). Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3), 257–276. https://ieeexplore.ieee.org/abstract/document/10028728
Sun, Y., Sun, Z., & Chen, W. (2024). The evolution of object detection methods. Engineering Applications of Artificial Intelligence, 133, 108458. https://www.sciencedirect.com/science/article/pii/S095219762400616X
Chen, W., Li, Y., Tian, Z., & Zhang, F. (2023). 2D and 3D object detection algorithms from images: A Survey. Array, 19, 100305. https://www.sciencedirect.com/science/article/pii/S2590005623000309
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. European Conference on Computer Vision, 213–229. https://arxiv.org/abs/2005.12872
Zong, Z., Song, G., & Liu, Y. (2023). Detrs with collaborative hybrid assignments training. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6748–6758. https://arxiv.org/abs/2211.12860
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. ArXiv Preprint ArXiv:2010.04159. https://arxiv.org/abs/2010.04159
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., & others. (2023). Dinov2: Learning robust visual features without supervision. ArXiv Preprint ArXiv:2304.07193. https://arxiv.org/abs/2304.07193
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems, 29. https://arxiv.org/abs/1605.06409
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, 2961–2969. https://arxiv.org/abs/1703.06870l
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6154–6162. https://arxiv.org/abs/1712.00726
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980–2988. https://arxiv.org/abs/1708.02002
Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision (ECCV), 734–750. https://arxiv.org/abs/1808.01244
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022. https://arxiv.org/abs/2103.14030
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9627–9636. https://arxiv.org/abs/1904.01355
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE/CVF International Conference on CVPR, 6569–6578. https://arxiv.org/abs/1904.08189
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on CVPR, 10781–10790. https://arxiv.org/abs/1911.09070
Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu, T., Lu, L., Li, H., & others. (2023). Internimage: Exploring large-scale vision foundation models with deformable convolutions. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14408–14419. https://arxiv.org/abs/2211.05778
Li, Y., Mao, H., Girshick, R., & He, K. (2022). Exploring plain vision transformer backbones for object detection. European Conference on Computer Vision, 280–296. https://arxiv.org/abs/2203.16527
Tian, Y., Ye, Q., & Doermann, D. (2025). Yolov12: Attention-centric real-time object detectors. ArXiv Preprint ArXiv:2502.12524. https://arxiv.org/abs/2502.12524
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., & Chen, J. (2024). Detrs beat yolos on real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16965–16974. https://arxiv.org/abs/2304.08069
Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2015). Data-driven 3d voxel patterns for object category recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1903–1911. https://ieeexplore.ieee.org/document/7298800
Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2017). Subcategory-aware convolutional neural networks for object proposals and detection. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 924–933. https://arxiv.org/abs/1604.04693
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., & Urtasun, R. (2016). Monocular 3d object detection for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2147–2156. https://ieeexplore.ieee.org/document/7780605
Mousavian, A., Anguelov, D., Flynn, J., & Kosecka, J. (2017). 3d bounding box estimation using deep learning and geometry. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7074–7082. https://arxiv.org/abs/1612.00496
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., & Chateau, T. (2017). Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2040–2049. https://arxiv.org/abs/1703.07570
Kundu, A., Li, Y., & Rehg, J. M. (2018). 3d-rcnn: Instance-level 3d object reconstruction via render-and-compare. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3559–3568. https://ieeexplore.ieee.org/document/8578473
Xu, B., & Chen, Z. (2018). Multi-level fusion based 3d object detection from monocular images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2345–2353. https://ieeexplore.ieee.org/document/8578347/
Qin, Z., Wang, J., & Lu, Y. (2019). Monogrnet: A geometric reasoning network for monocular 3d object localization. Proceedings of the AAAI Conference on Artificial Intelligence, 33. https://arxiv.org/abs/1811.10247
Wang, S., & Zheng, J. (2023). MonoSKD: General distillation framework for monocular 3D object detection via Spearman correlation coefficient. ArXiv Preprint ArXiv:2310.11316. https://arxiv.org/abs/2310.11316
Xu, J., Peng, L., Cheng, H., Li, H., Qian, W., Li, K., Wang, W., & Cai, D. (2023). Mononerd: Nerf-like representations for monocular 3d object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6814–6824. https://arxiv.org/abs/2308.09421
Yan, L., Yan, P., Xiong, S., Xiang, X., & Tan, Y. (2024). Monocd: Monocular 3d object detection with complementary depths. IEE, 10248–10257. https://arxiv.org/abs/2404.03181
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A. G., Ma, H., Fidler, S., & Urtasun, R. (2015). 3d object proposals for accurate object class detection. Advances in Neural Information Processing Systems, 28. https://arxiv.org/abs/1608.07711
Wang, Y., Chao, W.-L., & al., E. (2019). Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8445–8453. https://arxiv.org/abs/1812.07179
Li, P., Chen, X., & Shen, S. (2019). Stereo r-cnn based 3d object detection for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://arxiv.org/abs/1902.09738
Chen, Y., Liu, S., Shen, X., & Jia, J. (2020). Dsgn: Deep stereo geometry network for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://arxiv.org/pdf/2001.03398
Frøysa, T. D. (2018). Perception for an Autonomous Racecar [Master's thesis]. NTNU.
Qie, L., Gong, J., & al., E. (2020). Cone detection and location for formula student driverless race. 2019 6th International Conference on Dependable Systems and Their Applications, 440–444.
Gonzalez, R. (2020). Improved cone detection system with NN for a Formula Student car [B.S. thesis]. Universitat Politècnica de Catalunya.
Dhall, A. (2018). Real-time 3D pose estimation with a monocular camera using deep learning and object priors on an autonomous racecar. ArXiv Preprint ArXiv:1809.10548.
Minorello, F. (2025). A Stereo Vision SLAM Front-End for the Formula Student Driverless Competition [Master's thesis]. University of Padova.
Quigley, M., Conley, K., & al., E. (2009). ROS: an open-source Robot Operating System. ICRA Workshop on Open Source Software, 3, 5. http://lars.mec.ua.pt/public/LAR%20Projects/BinPicking/2016_RodrigoSalgueiro/LIB/ROS/icraoss09-ROS.pdf
Wang, Y., Guizilini, V. C., Zhang, T., Wang, Y., Zhao, H., & Solomon, J. (2022). Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. Conference on Robot Learning, 180–191. https://arxiv.org/abs/2110.06922
Liu, Y., Wang, T., Zhang, X., & Sun, J. (2022). Petr: Position embedding transformation for multi-view 3d object detection. European Conference on Computer Vision, 531–548. https://arxiv.org/abs/2203.05625
Li, Z., Wang, W., & al., E. (2024). Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://arxiv.org/abs/2203.17270
Liu, H., Teng, Y., Lu, T., Wang, H., & Wang, L. (2023). Sparsebev: High-performance sparse 3d object detection from multi-camera videos. Proceedings of the IEEE/CVF International Conference on Computer Vision, 18580–18590. https://arxiv.org/abs/2308.09244
Ji, H., Ni, T., Huang, X., Luo, T., Zhan, X., & Chen, J. (2025). RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding. ArXiv Preprint ArXiv:2504.12643. https://arxiv.org/abs/2504.12643
Liu, Z., Ye, X., Tan, X., Ding, E., & Bai, X. (2023). Stereodistill: Pick the cream from lidar for distilling stereo-based 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1790–1798. https://arxiv.org/pdf/2301.01615
Guo, X., Shi, S., Wang, X., & Li, H. (2021). Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. Proceedings of the IEEE/CVF International Conference on Computer Vision, 3153–3163. https://arxiv.org/abs/2108.08258
Liu, Y., Wang, L., & Liu, M. (2021). Yolostereo3d: A step back to 2d for efficient stereo 3d detection. 2021 IEEE International Conference on Robotics and Automation (ICRA), 13018–13024. https://arxiv.org/abs/2103.09422
Brazil, G., & Liu, X. (2019). M3d-rpn: Monocular 3d region proposal network for object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9287–9296. https://arxiv.org/abs/1907.06038
Liu, Z., Wu, Z., & Tóth, R. (2020). Smoke: Single-stage monocular 3d object detection via keypoint estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 996–997. https://arxiv.org/abs/2002.10111
Limaye, A., Mathew, M., & al., E. (2020). SS3D: Single shot 3D object detector. ArXiv Preprint ArXiv:2004.14674. https://arxiv.org/abs/2004.14674
Zhang, Y., Lu, J., & Zhou, J. (2021). Objects are different: Flexible monocular 3d object detection. Proceedings of the IEEE/CVF Conference on CVPR, 3289–3298. https://arxiv.org/abs/2104.02323
Chong, Z., & Ma, X. andE. al. (2022). Monodistill: Learning spatial features for monocular 3d object detection. ArXiv Preprint ArXiv:2201.10830. https://arxiv.org/pdf/2201.10830
Simonelli, A., Bulo, S. R., & al., E. (2019). Disentangling monocular 3d object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1991–1999. https://arxiv.org/abs/1905.12365
Ma, X., & Wang, Z. andE. al. (2019). Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6851–6860.
Choi, H. M., Kang, H., & Hyun, Y. (2019). Multi-view reprojection architecture for orientation estimation. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2357–2366. https://ieeexplore.ieee.org/document/9022190
Zhou, Y., & Tuzel, O. (2018). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4490–4499.
Zhou, Y., & Tuzel, O. (2018). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the IEEE Conference on CVPR, 4490–4499. https://arxiv.org/abs/1611.08069
Simon, M., Milz, S., Amende, K., & Gross, H.-M. (2018). Complex-yolo: Real-time 3d object detection on point clouds. ArXiv Preprint ArXiv:1803.06199. https://arxiv.org/abs/1803.06199
Beltrán, J., Guindel, C., & al., E. (2018). Birdnet: a 3d object detection framework from lidar information. 2018 21st International Conference on Intelligent Transportation Systems, 3517–3523. https://arxiv.org/abs/1805.01195
Yang, B., Luo, W., & Urtasun, R. (2018). PIXOR: Real-time 3D Object Detection from Point Clouds. Proceedings of the IEEE Conference on CVPR, 7652–7660. https://arxiv.org/abs/1902.06326
Wang, D. Z., & Posner, I. (2015). Voting for voting in online point cloud object detection. Robotics: Science and Systems, 1, 10–15. https://www.roboticsproceedings.org/rss11/p35.pdf
Yan, Y., Mao, Y., & Li, B. (2018). SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18, 3337. https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). PointPillars: Fast Encoders for Object Detection from Point Clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12697–12705. https://arxiv.org/abs/1812.05784
Chen, Q., Sun, L., Wang, Z., Jia, K., & Yuille, A. (2020). Object as hotspots: An anchor-free 3d object detection approach via firing of hotspots. Computer Vision–ECCV 2020: 16th European Conference, 68–84. https://arxiv.org/abs/1912.12791
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., & Li, H. (2021). Voxel r-cnn: Towards high performance voxel-based 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 1201–1209. https://arxiv.org/abs/2012.15712
Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., & Xu, C. (2021). Voxel transformer for 3d object detection. Proceedings of the IEEE/CVF International Conference on CVPR, 3164–3173. https://arxiv.org/abs/2109.02497
Wu, H., & al., E. (2023). Transformation-equivariant 3d object detection for autonomous driving. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 2795–2802. https://arxiv.org/abs/2211.11962
Yang, Z., Sun, Y., Liu, S., Shen, X., & Jia, J. (2019). Std: Sparse-to-dense 3d object detector for point cloud. Proceedings of the IEEE/CVF International Conference on CVPR, 1951–1960. https://arxiv.org/abs/1907.10471
Yang, Z., Sun, Y., Liu, S., & Jia, J. (2020). 3dssd: Point-based 3d single stage object detector. Proceedings of the IEEE/CVF Conference on CVPR, 11040–11048. https://arxiv.org/abs/2002.10187
Pan, X., Xia, Z., Song, S., Li, L. E., & Huang, G. (2021). 3d object detection with pointformer. Proceedings of the IEEE/CVF Conference on CVPR, 7463–7472. https://arxiv.org/abs/2012.11409
Zhang, Y., Hu, Q., Xu, G., Ma, Y., Wan, J., & Guo, Y. (2022). Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18953–18962. https://arxiv.org/abs/2203.11139
Chen, Y., Liu, S., Shen, X., & Jia, J. (2019). Fast point r-cnn. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9775–9784. https://arxiv.org/abs/1908.02990
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10529–10538. https://arxiv.org/abs/1912.13192
Hu, J. S. K., Kuai, T., & Waslander, S. L. (2022). Point density-aware voxels for lidar 3d object detection. Proceedings of the IEEE/CVF Conference, 8469–8478. https://arxiv.org/abs/2203.05662
Wang, Z., & Jia, K. (2019). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1742–1749. https://arxiv.org/abs/1903.01864
Qi, C. R., Liu, W., Wu, C., Su, H., & Guibas, L. J. (2018). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on CVPR, 918–927. https://arxiv.org/abs/1711.08488
Paigwar, A., Sierra-Gonzalez, D., Erkent, Ö., & Laugier, C. (2021). Frustum-pointpillars: A multi-stage approach for 3d object detection using rgb camera and lidar. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2926–2933. https://arxiv.org/abs/1711.08488
Vora, S., Lang, A. H., Helou, B., & Beijbom, O. (2020). Pointpainting: Sequential fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4604–4612. https://arxiv.org/abs/1911.10150
Wu, H., Wen, C., Shi, S., Li, X., & Wang, C. (2023). Virtual sparse convolution for multimodal 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21653–21662. https://arxiv.org/abs/2303.02314
Pang, S., Morris, D., & Radha, H. (2020). CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 10386–10393. https://arxiv.org/abs/2009.00784
Pang, S., Morris, D., & Radha, H. (2022). Fast-CLOCs: Fast camera-LiDAR object candidates fusion for 3D object detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 187–196. https://openaccess.thecvf.com/content/WACV2022/papers/Pang_Fast-CLOCs_Fast_Camera-LiDAR_Object_Candidates_Fusion_for_3D_Object_Detection_WACV_2022_paper.pdf
Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on CVPR, 1907–1915. https://arxiv.org/abs/1611.07759
Ku, J., Mozifian, M., Lee, J., Harakeh, A., & Waslander, S. L. (2018). Joint 3d proposal generation and object detection from view aggregation. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–8. https://arxiv.org/abs/1712.02294
Liang, M., Yang, B., Wang, S., & Urtasun, R. (2018). Deep continuous fusion for multi-sensor 3d object detection. Proceedings of the European Conference on Computer Vision (ECCV), 641–656. https://openaccess.thecvf.com/content_ECCV_2018/papers/Ming_Liang_Deep_Continuous_Fusion_ECCV_2018_paper.pdf
Liang, M., Yang, B., Chen, Y., Hu, R., & Urtasun, R. (2019). Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7345–7353. https://arxiv.org/abs/2012.12397
Huang, T., Liu, Z., Chen, X., & Bai, X. (2020). Epnet: Enhancing point features with image semantics for 3d object detection. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, 35–52. https://arxiv.org/abs/2007.08856
Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., & Tai, C.-L. (2022). Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1090–1099. https://arxiv.org/abs/2203.11496
Chen, X., Zhang, T., Wang, Y., Wang, Y., & Zhao, H. (2023). Futr3d: A unified sensor fusion framework for 3d detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 172–181. https://arxiv.org/abs/2203.10642
Cândido, B., Santos, N. P., Moutinho, A., & Zacchi, J.-V. (2025). Uncrewed Ground Vehicles in Military Operations: Lessons Learned from Experimental Exercises. 2025 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), 21–26. https://ieeexplore.ieee.org/document/10970121
Wang, Y., Yang, B., Hu, R., Liang, M., & Urtasun, R. (2021). PLUMENet: Efficient 3D object detection from stereo images. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3383–3390.
Shi, S., Wang, X., & Li, H. (2019). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 770–779. https://arxiv.org/abs/1812.04244
Shi, S., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2647–2664. https://arxiv.org/abs/1907.03670
He, C., Zeng, H., Huang, J., Hua, X.-S., & Zhang, L. (2020). Structure aware single-stage 3d object detection from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11873–11882.
Sindagi, V. A., Zhou, Y., & Tuzel, O. (2019). Mvx-net: Multimodal voxelnet for 3d object detection. 2019 International Conference on Robotics and Automation (ICRA), 7276–7282. https://arxiv.org/abs/1904.01649