Volume 22, Issue 1 (5-2025)                   JSDP 2025, 22(1): 83-112 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Bayat M H, Tarvirdizadeh B, Shahbazi M. Visual Object Tracking: Temporal, Spatial, Appearance, and Motion Features. JSDP 2025; 22 (1) :83-112
URL: http://jsdp.rcisp.ac.ir/article-1-1409-en.html
Abstract:   (81 Views)
Vision-based object tracking, as one of the most challenging fields in machine vision, means following the target(s) in the sequence of image frames in the presence of various challenges. In general, tracking algorithms can be classified to the single-target and multi-target based on the number of objects that should be tracked in frames. Trackers use two basic features in tracking: the appearance and motion. The appearance features are extracted from independent images but the motion features are produced through sequence of frames. According to the evaluations, motion models improve the tracking performance and take less process compared to the appearance features. Our investigations show that in contrast of single-target algorithms, the multi-target algorithms consider more contribution for the motion models, and due to the multiplicity of objectives in the scene they focus less on the appearance features.
Despite the wide range of methods and significant progress in machine vision, reliable and flawless performance cannot be expected in the use of tracking algorithms with real-time criteria. This will be aggravated if one of the challenges occurs. Challenges such as sudden and fast movements by the target, occlusion by obstacles or other targets in the scene, extreme changes in the appearance and dimensions of the target, as well as entering and exiting the scene, which cause tracking algorithms to fail.
Having a good trade-off between the accuracy and the execution speed is one of the main problems for applied tracking algorithms. Detection algorithms, which are known to detect different targets in an independent image, have shown acceptable accuracy, but it is not possible to use them in every frame for a real-time tracking, because either due to the high processing volume of these algorithms, the execution speed of the detector is limited or they are only able to identify certain classes. But the purpose of a general tracking is to follow an object in a sequence of images regardless of its type and class as well as considering temporal and spatial dependencies among successive frames.
With the development of recurrent neural networks and their great ability to process sequential data such as text, audio and video, their use in tracking algorithms is increasing. The use of these networks has helped to improve the performance of tracking algorithms due to their short-term and long-term memory in maintaining important features during tracking. Different methods of integrating convolutional and recurrent neural networks are presented and showed grate performance in tracking, but the main drawback of most of them is the low execution speed of the algorithms. Our studies show that direct feeding the high-dimensional inputs, such as features extracted from images, to the recurrent networks greatly reduces their processing speed. Therefore, in some methods with the approach of real-time execution, the dimensions of the recurrent networks input are downsampled and reduced to the smaller size, although the accuracy is also slightly reduced.
Our investigations show that the use of motion models in single-target tracking algorithms is less explored compared to the multi-target methods. Meanwhile, the studies show the success of these models in improving tracking performance. Before the introduction of convolutional networks and their remarkable success in extracting deep features from the image, motion models were mostly used, but in recent methods, especially in single-target trackers, appearance features are used more. In single-target algorithms, the presence of only one object in the image and less computational volume compared to multi-target algorithms allows for more free use of appearance features, but this is not possible in multi-target tracking due to the multiplicity of targets so the motion models are more useful in these algorithms. Therefore, in this paper, a more detailed investigation of motion models and their effect on tracking performance is done. The results show that motion models have a profound effect on improving tracking performance while being simple and impose low processing volume.
In this paper, a comprehensive review and implementation of different tracking algorithms is discussed and appropriate methods are introduced for practical implementations. On the other hand, different tracking structures are investigated and categorized based on spatial, temporal, appearance and motion features. Also, due to the development of deep learning methods and their impact on tracking, deep architectures, training datasets and standard evaluation methods are studied and the future horizon of this field is discussed. Our studies show that temporal and motion features have received less attention despite their favorable impact on tracking performance. With the development of deep memory networks, the use of these features is increasing and they have taken a greater portion in tracking.
Full-Text [PDF 2064 kb]   (24 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2023/11/24 | Accepted: 2024/12/4 | Published: 2025/06/21 | ePublished: 2025/06/21

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing