A Review of Vision-Based Tracking Methods: Temporal and Spatial Features

Bayat, Mohammad Hosein; Tarvirdizadeh, Bahram; Shahbazi, Mohammad

doi:10.61186/jsdp.22.1.83

Volume 22, Issue 1 (5-2025) JSDP 2025, 22(1): 83-112 | Back to browse issues page

‎ 10.61186/jsdp.22.1.83

Mendeley

Zotero

RefWorks

Bayat M H, Tarvirdizadeh B, Shahbazi M. A Review of Vision-Based Tracking Methods: Temporal and Spatial Features. JSDP 2025; 22 (1) :83-112
URL: http://jsdp.rcisp.ac.ir/article-1-1409-en.html

A Review of Vision-Based Tracking Methods: Temporal and Spatial Features

Mohammad Hosein Bayat

, Bahram Tarvirdizadeh

, Mohammad Shahbazi ^*

Assistant Professor, School of Mechanical Engineering, Iran University of Science and Technology, Tehran, Iran

Abstract: (418 Views)

Vision-based object tracking, as one of the most challenging fields in machine vision, means following the target(s) in the sequence of image frames in the presence of various challenges. In general, tracking algorithms can be classified to the single-target and multi-target based on the number of objects that should be tracked in frames. Trackers use two basic features in tracking: the appearance and motion. The appearance features are extracted from independent images but the motion features are produced through sequence of frames. According to the evaluations, motion models improve the tracking performance and take less process compared to the appearance features. Our investigations show that in contrast of single-target algorithms, the multi-target algorithms consider more contribution for the motion models, and due to the multiplicity of objectives in the scene they focus less on the appearance features.
Despite the wide range of methods and significant progress in machine vision, reliable and flawless performance cannot be expected in the use of tracking algorithms with real-time criteria. This will be aggravated if one of the challenges occurs. Challenges such as sudden and fast movements by the target, occlusion by obstacles or other targets in the scene, extreme changes in the appearance and dimensions of the target, as well as entering and exiting the scene, which cause tracking algorithms to fail.
Having a good trade-off between the accuracy and the execution speed is one of the main problems for applied tracking algorithms. Detection algorithms, which are known to detect different targets in an independent image, have shown acceptable accuracy, but it is not possible to use them in every frame for a real-time tracking, because either due to the high processing volume of these algorithms, the execution speed of the detector is limited or they are only able to identify certain classes. But the purpose of a general tracking is to follow an object in a sequence of images regardless of its type and class as well as considering temporal and spatial dependencies among successive frames.
With the development of recurrent neural networks and their great ability to process sequential data such as text, audio and video, their use in tracking algorithms is increasing. The use of these networks has helped to improve the performance of tracking algorithms due to their short-term and long-term memory in maintaining important features during tracking. Different methods of integrating convolutional and recurrent neural networks are presented and showed grate performance in tracking, but the main drawback of most of them is the low execution speed of the algorithms. Our studies show that direct feeding the high-dimensional inputs, such as features extracted from images, to the recurrent networks greatly reduces their processing speed. Therefore, in some methods with the approach of real-time execution, the dimensions of the recurrent networks input are downsampled and reduced to the smaller size, although the accuracy is also slightly reduced.
Our investigations show that the use of motion models in single-target tracking algorithms is less explored compared to the multi-target methods. Meanwhile, the studies show the success of these models in improving tracking performance. Before the introduction of convolutional networks and their remarkable success in extracting deep features from the image, motion models were mostly used, but in recent methods, especially in single-target trackers, appearance features are used more. In single-target algorithms, the presence of only one object in the image and less computational volume compared to multi-target algorithms allows for more free use of appearance features, but this is not possible in multi-target tracking due to the multiplicity of targets so the motion models are more useful in these algorithms. Therefore, in this paper, a more detailed investigation of motion models and their effect on tracking performance is done. The results show that motion models have a profound effect on improving tracking performance while being simple and impose low processing volume.
In this paper, a comprehensive review and implementation of different tracking algorithms is discussed and appropriate methods are introduced for practical implementations. On the other hand, different tracking structures are investigated and categorized based on spatial, temporal, appearance and motion features. Also, due to the development of deep learning methods and their impact on tracking, deep architectures, training datasets and standard evaluation methods are studied and the future horizon of this field is discussed. Our studies show that temporal and motion features have received less attention despite their favorable impact on tracking performance. With the development of deep memory networks, the use of these features is increasing and they have taken a greater portion in tracking.

Keywords: Vision-Based Object Tracking, Appearance Features, Motion Features, Deep Learning, Machine Vision.

Full-Text [PDF 1605 kb] (150 Downloads)

Type of Study: Research | Subject: Paper
Received: 2023/11/24 | Accepted: 2024/12/4 | Published: 2025/06/21 | ePublished: 2025/06/21

References

1. M. Biglari, A. Soleimani, and H. Hassanpour, "Using Discriminative Parts for Vehicle Make and Model Recognition," Signal Data Process., vol. 15, no. 1, 2018, doi: 10.29252/jsdp.15.1.41. [DOI:10.29252/jsdp.15.1.41]

2. J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, "Exploiting the circulant structure of tracking-by-detection with kernels," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, pp. 702-715. doi: 10.1007/978-3-642-33765-9_50. [DOI:10.1007/978-3-642-33765-9_50]

3. Y. Lecun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015, doi: 10.1038/nature14539. [DOI:10.1038/nature14539] [PMID]

4. P. Li, D. Wang, L. Wang, and H. Lu, "Deep visual tracking: Review and experimental comparison," Pattern Recognit, vol. 76, pp. 323-338, 2018, doi: 10.1016/j.patcog.2017.11.007. [DOI:10.1016/j.patcog.2017.11.007]

5. A. Sadeghian, A. Alahi, and S. Savarese, "Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies," Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 300-311, 2017, doi: 10.1109/ICCV.2017.41. [DOI:10.1109/ICCV.2017.41]

6. W. Liu et al., "SSD: Single shot multibox detector," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, pp. 21-37. doi: 10.1007/978-3-319-46448-0_2. [DOI:10.1007/978-3-319-46448-0_2]

7. J. S. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, "Deep residual learning for image recognition," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016. [DOI:10.1109/CVPR.2016.90] [PMID]

8. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1-14, 2015.

9. Krizhevsky Alex, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Adv Neural Inf Process Syst, pp. 145-151, 2012, doi: 10.1145/3383972.3383975. [DOI:10.1145/3383972.3383975]

10. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA: ACM, Sep. 2016, pp. 779-788. [DOI:10.1109/CVPR.2016.91]

11. J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.

12. S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Adv Neural Inf Process Syst, vol. 28, pp. 91-99, 2015.

13. R. He, K., Gkioxari, G., Dollár, P., & Girshick, "Mask r-cnn," In Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017. [DOI:10.1109/ICCV.2017.322]

14. D. Zhang, H. Maei, X. Wang, and Y.-F. Wang, "Deep reinforcement learning for visual object tracking in videos," arXiv preprint arXiv:1701.08936, 2017.

15. R. Spilger et al., "A Recurrent Neural Network for Particle Tracking in Microscopy Images Using Future Information, Track Hypotheses, and Multiple Detections," IEEE Transactions on Image Processing, vol. 29, pp. 3681-3694, 2020, doi: 10.1109/TIP.2020.2964515. [DOI:10.1109/TIP.2020.2964515] [PMID]

16. T. Yang and A. B. Chan, "Learning dynamic memory networks for object tracking," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11213 LNCS, pp. 153-169, 2018, doi: 10.1007/978-3-030-01240-3_10. [DOI:10.1007/978-3-030-01240-3_10]

17. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, "Fully-convolutional siamese networks for object tracking," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9914 LNCS, pp. 850-865, 2016, doi: 10.1007/978-3-319-48881-3_56. [DOI:10.1007/978-3-319-48881-3_56]

18. G. Plastiras, C. Kyrkou, and T. Theocharides, "You Only Look Once: Unified, Real-Time Object Detection," ArXiv, 2019.

19. K. Remya and C. V Vipin Krishnan, "Survey of Generative and Discriminative Appearance Models in Visual Object Tracking," International Journal of Advance Research, Ideas and Innovations in Technology, vol. 4, no. 1, pp. 343-346, 2018.

20. Y. Wu, J. Lim, and M. H. Yang, "Online object tracking: A benchmark," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, 2013, doi: 10.1109/CVPR.2013.312. [DOI:10.1109/CVPR.2013.312] [PMID] []

21. Y. Wu, J. Lim, and M. H. Yang, "Object tracking benchmark," IEEE Trans Pattern Anal Mach Intell, vol. 37, no. 9, pp. 1834-1848, 2015, doi: 10.1109/TPAMI.2014.2388226. [DOI:10.1109/TPAMI.2014.2388226] [PMID]

22. H. J. C. Kristan, Matej, Aleš Leonardis, Jiří Matas, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kämäräinen, "The Tenth Visual Object Tracking VOT2022 Challenge Results," In Computer Vision-ECCV 2022 Workshops, pp. 431-460, 2023.

23. K. H. Huang, Lianghua, Xin Zhao, "Got-10k: A large high diversity benchmark for generic object tracking in the wild," IEEE Trans Pattern Anal Mach Intell, pp. 1562-1577, 2019. [DOI:10.1109/TPAMI.2019.2957464] [PMID]

24. H. L. Fan, Heng, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, "Lasot: A high-quality benchmark for large-scale single object tracking," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5374-5383, 2019. [DOI:10.1109/CVPR.2019.00552]

25. B. G. Muller, Matthias, Adel Bibi, Silvio Giancola, Salman Alsubaihi, "Trackingnet: A large-scale dataset and benchmark for object tracking in the wild," In Proceedings of the European conference on computer vision (ECCV), pp. 300-317, 2018. [DOI:10.1007/978-3-030-01246-5_19]

26. S. L. Kiani Galoogahi, Hamed, Ashton Fagg, Chen Huang, Deva Ramanan, "Need for speed: A benchmark for higher frame rate object tracking," In Proceedings of the IEEE International Conference on Computer Vision, pp. 1125-1134, 2017. [DOI:10.1109/ICCV.2017.128]

27. B. G. Mueller, Matthias, Neil Smith, "A benchmark and simulator for uav tracking," In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I 14, Springer International Publishing, pp. 445-461, 2016. [DOI:10.1007/978-3-319-46448-0_27]

28. M. S. et al Fan, Heng, Longyin Wen, Dawei Du, Pengfei Zhu, Qinghua Hu, Haibin Ling, "Visdrone-sot2020: The vision meets drone single object tracking challenge results," In Computer Vision-ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part IV 16, pp. 728-749. Springer International Publishing, 2020. [DOI:10.1007/978-3-030-66823-5_44]

29. J. Z. et al Chen, Guanlin, Wenguan Wang, Zhijian He, Lujia Wang, Yixuan Yuan, Dingwen Zhang, "VisDrone-MOT2021: The vision meets drone multiple object tracking challenge results," In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2839-2846, 2021. [DOI:10.1109/ICCVW54120.2021.00318]

30. M. S. et al Du, Dawei, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, "Visdrone-cc2020: The vision meets drone crowd counting challenge results," In Computer Vision-ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part IV 16, pp. 675-691. Springer International Publishing, 2020. [DOI:10.1007/978-3-030-66823-5_41]

31. Q. T. Du, Dawei, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, "The unmanned aerial vehicle benchmark: Object detection and tracking," In Proceedings of the European conference on computer vision (ECCV), pp. 370-386, 2018.

32. L. L.-T. Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, "MOT20: A benchmark for multi object tracking in crowded scenes," arXiv:2003.09003, 2020.

33. D. R. Achal Dave, Tarasha Khurana, Pavel Tokmakov, Cordelia Schmid, "Tao: A large-scale benchmark for tracking any object," In European Conference on Computer Vision, 2020.

34. N. S. Lin, Weiyao, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, "Human in events: A large-scale benchmark for human-centric video analysis in complex events," arXiv preprint arXiv:2005.04490, 2020.

35. M. Kristan et al., "The Eighth Visual Object Tracking VOT2020 Challenge ResultsKristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L. Č., Lukežič, A., Drbohlav, O., He, L., Zhang, Y., Yan, S., Yang, J., Fernández, G.," pp. 547-601, 2020. [DOI:10.1007/978-3-030-68238-5_39]

36. Y. Wu, J. Lim, and M. H. Yang, "Object tracking benchmark," IEEE Trans Pattern Anal Mach Intell, vol. 37, no. 9, pp. 1834-1848, 2015, doi: 10.1109/TPAMI.2014.2388226. [DOI:10.1109/TPAMI.2014.2388226] [PMID]

37. D. Gordon, A. Farhadi, and D. Fox, "Re3 : Real-Time Recurrent Regression Networks for Object Tracking," IEEE Robot Autom Lett, vol. 3, pp. 788-795, 2018. [DOI:10.1109/LRA.2018.2792152]

38. G. Ciaparrone, F. Luque Sánchez, S. Tabik, L. Troiano, R. Tagliaferri, and F. Herrera, "Deep learning in video multi-object tracking: A survey," Neurocomputing, vol. 381, pp. 61-88, 2020, doi: 10.1016/j.neucom.2019.11.023. [DOI:10.1016/j.neucom.2019.11.023]

39. J. Fan, W. Xu, Y. Wu, and Y. Gong, "Human tracking using convolutional neural networks," IEEE Trans Neural Netw, vol. 21, no. 10, pp. 1610-1623, 2010, doi: 10.1109/TNN.2010.2066286. [DOI:10.1109/TNN.2010.2066286] [PMID]

40. T. T. Trinh, R. Yoshihashi, R. Kawakami, M. Iida, and T. Naemura, "Bird detection near wind turbines from high-resolution video using lstm networks," World Wind Energy Conference, 2016.

41. H. Fan and H. Ling, "SANet: Structure-Aware Network for Visual Tracking," IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2017-July, pp. 2217-2224, 2017, doi: 10.1109/CVPRW.2017.275. [DOI:10.1109/CVPRW.2017.275] [PMID]

42. F. Bi et al., "Review on Video Object Tracking Based on Deep Learning," Journal of New Media, vol. 1, no. 2, pp. 63-74, 2019, doi: 10.32604/jnm.2019.06253. [DOI:10.32604/jnm.2019.06253]

43. X. Yang, C. Ma, J.-B. Huang, and M.-H. Yang, "Hierarchical Convolutional Features for Visual Tracking," Proceedings of the IEEE international conference on computer vision, pp. 3074-3082, 2015, doi: 10.1109/ICCV.2015.352. [DOI:10.1109/ICCV.2015.352]

44. D. Held, S. Thrun, and S. Savarese, "Learning to track at 100 FPS with deep regression networks," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 749-765, 2016, doi: 10.1007/978-3-319-46448-0_45. [DOI:10.1007/978-3-319-46448-0_45]

45. G. E. H. Krizhevsky, Alex, Ilya Sutskever, "Imagenet classification with deep convolutional neural networks," Adv Neural Inf Process Syst, pp. 1-1432, 2012, doi: 10.1201/9781420010749. [DOI:10.1201/9781420010749]

46. N. Mahmoudi, "Multi-target tracking using CNN-based features : CNNMTT," Multimedia Tools and Applications 78.6 (2019): 7077-7096., 2019. [DOI:10.1007/s11042-018-6467-6]

47. N. Wang, S. Li, A. Gupta, and D.-Y. Yeung, "Transferring Rich Feature Hierarchies for Robust Visual Tracking," arXiv preprint arXiv:1501.04587, 2015.

48. L. Wang, W. Ouyang, X. Wang, and H. Lu, "STCT: Sequentially training convolutional networks for visual tracking," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 1373-1381, 2016, doi: 10.1109/CVPR.2016.153. [DOI:10.1109/CVPR.2016.153]

49. Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, "Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism," Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 4846-4855, 2017, doi: 10.1109/ICCV.2017.518. [DOI:10.1109/ICCV.2017.518]

50. A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, "Visual tracking: An experimental survey," IEEE Trans Pattern Anal Mach Intell, vol. 36, no. 7, pp. 1442-1468, 2014, doi: 10.1109/TPAMI.2013.230. [DOI:10.1109/TPAMI.2013.230] [PMID]

51. O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," Int J Comput Vis, vol. 115, no. 3, pp. 211-252, 2015, doi: 10.1007/s11263-015-0816-y. [DOI:10.1007/s11263-015-0816-y]

52. M. Kristan, "The sixth Visual Object Tracking VOT2018 challenge results," Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018.

53. I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," p. 800, 2017.

54. C. Szegedy et al., "Going deeper with convolutions," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June, pp. 1-9, 2015, doi: 10.1109/CVPR.2015.7298594. [DOI:10.1109/CVPR.2015.7298594]

55. A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, and K. Schindler, "Online multi-target tracking using recurrent neural networks," in 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 4225-4232. [DOI:10.1609/aaai.v31i1.11194]

56. S. Hochreiter and J. Urgen Schmidhuber, "Long Short term Memory," Neural Comput, vol. 9, no. 8, p. 17351780, 1997. [DOI:10.1162/neco.1997.9.8.1735] [PMID]

57. G. Ning et al., "Spatially supervised recurrent convolutional neural networks for visual object tracking (ROLO)," Proceedings - IEEE International Symposium on Circuits and Systems, no. 1, pp. 1-4, 2017, doi: 10.1109/ISCAS.2017.8050867. [DOI:10.1109/ISCAS.2017.8050867]

58. K. Fang, "Track-RNN: Joint Detection and Tracking Using Recurrent Neural Networks," 29th Conference on Neural Information Processing Systems (NIPS 2016), no. Nips, 2016.

59. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031. [DOI:10.1109/TPAMI.2016.2577031] [PMID]

60. T. Yang and A. B. Chan, "Visual Tracking via Dynamic Memory Networks," IEEE Trans Pattern Anal Mach Intell, vol. 14, no. 8, pp. 1-1, 2019, doi: 10.1109/tpami.2019.2929034. [DOI:10.1109/TPAMI.2019.2929034] [PMID]

61. T. Yang and A. B. Chan, "Recurrent Filter Learning for Visual Tracking," Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, vol. 2018-Janua, pp. 2010-2019, 2017, doi: 10.1109/ICCVW.2017.235. [DOI:10.1109/ICCVW.2017.235] [PMID]

62. P. Ondrúška and I. Posner, "Deep tracking: Seeing beyond seeing using recurrent neural networks," 30th AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 3361-3367, 2016. [DOI:10.1609/aaai.v30i1.10413]

63. Q. Gan, Q. Guo, Z. Zhang, and K. Cho, "First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks," arXiv preprint arXiv:1511.06425, pp. 1-13, 2015.

64. S. E. Kahou, V. Michalski, R. Memisevic, C. Pal, and P. Vincent, "RATM: Recurrent Attentive Tracking Model," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 1613-1622. doi: 10.1109/CVPRW.2017.206. [DOI:10.1109/CVPRW.2017.206]

65. Q. Wang, C. Yuan, J. Wang, and W. Zeng, "Learning attentional recurrent neural network for visual tracking," IEEE Trans Multimedia, vol. 21, no. 4, pp. 930-942, 2019, doi: 10.1109/TMM.2018.2869277. [DOI:10.1109/TMM.2018.2869277]

66. J. Jin, J. Bates, C. Farabet, and E. Culurciello, "Tracking with Deep Neural Networks," 2013 47th Annual Conference on Information Sciences and Systems (CISS). IEEE, no. 1, 2013. [DOI:10.1109/CISS.2013.6552287]

67. S. Hong, T. You, S. Kwak, and B. Han, "Online tracking by learning discriminative saliency map with convolutional neural network," 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 597-606, 2015.

68. G. Koch, "Siamese Neural Networks for One-shot Image Recognition," 2011.

69. Y. Wu, Y. Sui, and G. Wang, "Vision-Based Real-Time Aerial Object Localization and Tracking for UAV Sensing System," IEEE Access, vol. 5, pp. 23969-23978, 2017, doi: 10.1109/ACCESS.2017.2764419. [DOI:10.1109/ACCESS.2017.2764419]

70. G. Zhu, F. Porikli, and H. Li, "Robust Visual Tracking with Deep Convolutional Neural Network based Object Proposals on PETS," Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016. [DOI:10.1109/CVPRW.2016.160]

71. S. P. Bharati, Y. Wu, Y. Sui, C. Padgett, and G. Wang, "Real-Time Obstacle Detection and Tracking for Sense-and-Avoid Mechanism in UAVs," IEEE Transactions on Intelligent Vehicles, vol. 3, no. 2, pp. 185-197, 2018, doi: 10.1109/tiv.2018.2804166. [DOI:10.1109/TIV.2018.2804166]

72. K. Zhu et al., "Single object tracking in satellite videos: Deep siamese network incorporating an interframe difference centroid inertia motion model," Remote Sens (Basel), vol. 13, no. 7, 2021, doi: 10.3390/rs13071298. [DOI:10.3390/rs13071298]

73. X. Y. Yan, Bin, Xinyu Zhang, Dong Wang, Huchuan Lu, "Alpha-refine: Boosting tracking performance by precise bounding box estimation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5289-5298, 2021. [DOI:10.1109/CVPR46437.2021.00525]

74. B. L. Voigtlaender, Paul, Jonathon Luiten, Philip HS Torr, "Siam r-cnn: Visual tracking by re-detection," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6578-6588, 2020. [DOI:10.1109/CVPR42600.2020.00661]

75. M. Paul, M. Danelljan, C. Mayer, and L. Van Gool, "Robust Visual Tracking by Segmentation," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13682 LNCS, pp. 571-588, 2022, doi: 10.1007/978-3-031-20047-2_33. [DOI:10.1007/978-3-031-20047-2_33]

76. C. Dicle, O. I. Camps, and M. Sznaier, "The way they move: Tracking multiple targets with similar appearance," Proceedings of the IEEE International Conference on Computer Vision, pp. 2304-2311, 2013, doi: 10.1109/ICCV.2013.286. [DOI:10.1109/ICCV.2013.286]

77. M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, "Accurate scale estimation for robust visual tracking," BMVC 2014 - Proceedings of the British Machine Vision Conference 2014, doi: 10.5244/c.28.65. [DOI:10.5244/C.28.65]

78. M. Babaee, Z. Li, and G. Rigoll, "Occlusion Handling in Tracking Multiple People Using RNN," Proceedings - International Conference on Image Processing, ICIP, pp. 2715-2719, 2018, doi: 10.1109/ICIP.2018.8451140. [DOI:10.1109/ICIP.2018.8451140]

79. A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," Proceedings - International Conference on Image Processing, ICIP, vol. 2016-Augus, pp. 3464-3468, 2016, doi: 10.1109/ICIP.2016.7533003. [DOI:10.1109/ICIP.2016.7533003]

80. G. Khan, Z. Tariq, and M. U. G. Khan, "Multi-Person Tracking Based on Faster R-CNN and Deep Appearance Features," Visual Object Tracking in the Deep Neural Networks Era. IntechOpen, vol. i, no. tourism, p. 13, 2019, doi: http://dx.doi.org/10.5772/57353. [DOI:10.5772/57353]

81. N. Wojke, A. Bewley, and D. Paulus, "Simple online and realtime tracking with a deep association metric," Proceedings - International Conference on Image Processing, ICIP, vol. 2017-Septe, pp. 3645-3649, 2018, doi: 10.1109/ICIP.2017.8296962. [DOI:10.1109/ICIP.2017.8296962]

82. H. Nam and B. Han, "Learning Multi-domain Convolutional Neural Networks for Visual Tracking," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 4293-4302, 2016, doi: 10.1109/CVPR.2016.465. [DOI:10.1109/CVPR.2016.465]

83. I. Jung, J. Son, M. Baek, and B. Han, "Real-Time MDNet," Proceedings of the European Conference on Computer Vision (ECCV), 2018. [DOI:10.1007/978-3-030-01225-0_6]

84. J. T. Shuai, Bing, Andrew Berneshawi, Xinyu Li, Davide Modolo, "Siammot: Siamese multi-object tracking," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12372-12382, 2021. [DOI:10.1109/CVPR46437.2021.01219]

85. S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, "You'll never walk alone: Modeling social behavior for multi-target tracking," Proceedings of the IEEE International Conference on Computer Vision, pp. 261-268, 2009, doi: 10.1109/ICCV.2009.5459260. [DOI:10.1109/ICCV.2009.5459260]

86. J. Qiu, L. Wang, Y. H. Hu, and Y. Wang, "Two motion models for improving video object tracking performance," Computer Vision and Image Understanding, vol. 195, no. March, p. 102951, 2020, doi: 10.1016/j.cviu.2020.102951. [DOI:10.1016/j.cviu.2020.102951]

87. B. Yang and R. Nevatia, "Multi-target tracking by online learning a CRF model of appearance and motion patterns," Int J Comput Vis, vol. 107, no. 2, pp. 203-217, 2014, doi: 10.1007/s11263-013-0666-4. [DOI:10.1007/s11263-013-0666-4]

88. M. Shahbazi, M. H. Bayat, and B. Tarvirdizadeh, "A motion model based on recurrent neural networks for visual object tracking," Image Vis Comput, vol. 126, p. 104533, 2022, doi: 10.1016/j.imavis.2022.104533. [DOI:10.1016/j.imavis.2022.104533]

89. Z. Kang, T. Xu, X. F. Zhu, and X. J. Wu, "Learning Motion-Perceive Siamese network for robust visual object tracking," Pattern Recognit Lett, vol. 173, pp. 23-29, Sep. 2023, doi: 10.1016/j.patrec.2023.07.011. [DOI:10.1016/j.patrec.2023.07.011]

90. H. Zhang, J. Zhang, G. Nie, J. Hu, and W. J. (Chris) Zhang, "Residual memory inference network for regression tracking with weighted gradient harmonized loss," Inf Sci (N Y), vol. 597, pp. 105-124, Jun. 2022, doi: 10.1016/j.ins.2022.03.047. [DOI:10.1016/j.ins.2022.03.047]

91. Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, "Ocean: Object-aware Anchor-free Tracking," Jun. 2020. [DOI:10.1007/978-3-030-58589-1_46]

92. J. Wang, C. Lai, W. Zhang, Y. Wang, and C. Meng, "Transformer tracking with multi-scale dual-attention," Complex & Intelligent Systems, vol. 9, no. 5, pp. 5793-5806, Oct. 2023, doi: 10.1007/s40747-023-01043-1. [DOI:10.1007/s40747-023-01043-1]

93. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, "SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks," Dec. 2018. [DOI:10.1109/CVPR.2019.00441]

94. Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, "Siamese Box Adaptive Network for Visual Tracking," Mar. 2020. [DOI:10.1109/CVPR42600.2020.00670] [PMID]

95. M. Danelljan, L. Van Gool, and R. Timofte, "Probabilistic Regression for Visual Tracking," Mar. 2020. [DOI:10.1109/CVPR42600.2020.00721]

96. D. Guo, J. Wang, Y. Cui, Z. Wang, and S. Chen, "SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 6268-6276. doi: 10.1109/CVPR42600.2020. 00630. [DOI:10.1109/CVPR42600.2020]

97. F. Xie, C. Wang, G. Wang, Y. Cao, W. Yang, and W. Zeng, "Correlation-Aware Deep Tracking," Mar. 2022. [DOI:10.1109/CVPR52688.2022.00855]

98. R. U. Geiger, Andreas, Martin Lauer, Christian Wojek, Christoph Stiller, "3d traffic scene understanding from movable platforms," IEEE transactions on pattern analysis and machine intelligence 36, no. 5, pp. 1012-1025, 2013. [DOI:10.1109/TPAMI.2013.185] [PMID]

99. J. M. Rehg. Kim, Chanho, Fuxin Li, Arridhana Ciptadi, "Multiple hypothesis tracking revisited," In Proceedings of the IEEE international conference on computer vision, pp. 4696-4704, 2015. [DOI:10.1109/ICCV.2015.533]

100. A. C. Sanchez-Matilla, Ricardo, Fabio Poiesi, "Online multi-target tracking with strong and weak detections," In Computer Vision-ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 84-99. Springer International Publishing, 2016. [DOI:10.1007/978-3-319-48881-3_7]

101. K. Schindler. Milan, Anton, Laura Leal-Taixé, Ian Reid, Stefan Roth, "MOT16: A benchmark for multi-object tracking," arXiv preprint arXiv:1603.00831, 2016.

102. G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, "Learning discriminative model prediction for tracking," Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-Octob, pp. 6181-6190, 2019, doi: 10.1109/ICCV.2019.00628. [DOI:10.1109/ICCV.2019.00628]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote