Title : Visual tracking using deep learning
Abstract: Visual Tracking is a most elemental research problem in computer vision. The aim of this paper is to introduce various features based tracking methods and analyse it. In this paper we discuss deep learning based methods base various trackers with datasets. Deep learning methods have improved the ability to classify, detect and recognize the object. A lot of computational power is required to solve deep learning problem because of the iterative nature of deep learning algorithm.
Key Parameters: Visual tracking, Recurrent neural network, Convolutional neural network",
1. Introduction
Object detection is the process of detect the locating object in frames of a video sequence.while the object tracking is the process of finding the direction of an object as object moves around a scene. During the process of tracking the important steps are as follows (1) Detection of interesting moving objects. (2) Tracking of related object from frame to frame (3) Analysis of objects track to recognise their behaviour. The general application of object tracking is in the task of traffic monitoring, Human computer interaction, automated surveillance, traffic monitoring etc. The general approach of object detection and tracking is shown in figure 1
Data Acquisition
Pre-Processing
Input VideoTracking Video
Feature Detection
Segmentation
Figure 1 : General block diagram of object detection and tracking
Visual tracking is a most active research problem now a days [1",2]. The motion of an object is the problem of generating an inference due to visual tracking. Despite various research on visual tracking, it is still suffer from object appearance changes by various factors such as, deformation occlusion, fast motion, scale variation so on.
Object detection is the process to identifying and locating all known objects in a scene. Object tracking does not need identify the objects. Objects can be tracked based on motion features without knowing the actual objects being tracked. Due to changes in visual appearance like occlusion, deformation, scale variation, illumination variation, blur motion, visual target tracking is still challenging task.
Recently, deep learning related method has started to concentrate on improving accuracy and speed for visual tracking. [3",4",5]. Also sometimes deep learning based method requires large amount of train data to extract the object features. This is the major drawback of deep learning based method.
The effects of visual trackers performance is to change from one image frame to another frame. Deep learning methods shown satisfactory performance with the drawback of having computational complexity [4",5]. In this work we discuss about different deep tracking methods and their comparison.
The main purpose of visual tracking is to detect and track any object from scenes. Mainly, a visual tracking methods consists of two important models (1) Motion model : Motion model aim is to estimate the next position based on a number of past observation (i.e Kalman filter[6] and Particle filter[7",8]). (2) Observation model : Observation model aim is to find important information of the tracked object. Such model classification is given in figure 2
Object Tracking Models
Observation Model
Motion Model
Observation Model
Discriminative Model
Figure 2 : Classification of Object Tracking Models
Furthermore based on the observation model, the present visual trackers can be further categorize into two section (1) Discriminative method [13",14] (2) Generative method [9",10",11",12]. Generative model is one to which present an input and it produces multiple result, commonly in sequence that are related to the input. These results may be used as discriminators or classificators., While discriminative model is one to which present an input and it produce a single result, which is commonly discriminative of the input. The tracking methods such as support vector machines [15], structured learning [16], correlation filter[17] and their problems can be solved by different deep learning based methods
In the reminder of this paper first we discuss the various important basic concept of visual tracking based methods. Furthermore discuss about the present visual tracking methods of deep learning. Also compare the various visual trackers and deep learning based methods and give important analysis of different visual tracking methods.
2. Deep learning algorithm
Deep learning has achieved a great success in image classification, vehicle detection, pose estimation etc. In this part, we discusses visual tracking based methods which are mainly used for image classification and object detection. The deep learning method is further classify as shown in figure 3.
Deep learning visual tracking
DLT deep compact Image
Recurrent Neural Network
Convolutional Neural Network
Figure 3 : Classification of Visual tracking methods
Deep convolutional neural network have performed considerable showing on the image classification and object detection problem. However it is more challenging task, because of addition to classifying object. Also localization all object instances in an image is also requires for object detection. CNN has powerful ability to extract features with training objective from the data. Through the visualization technique, Zeiler showed that the hand designed features are similar to the features extracted by CNN [18]. But, each convolutional map of the convolutional layer performs like a filter. Convolutional neural network can extract symbolic features for multiple convolutional and pooling layer. Mainly, low-level features respond edges, point, corners etc are used for image classification task. And high-level features are build on top of low level features to detect objects and larger shapes in the image. Although deep learning architectures are better than the schematic architecture, especially for a challenging environment. it is difficult to train a deep model for a large number of data sets. Fortunately, a set of good originate parameters can reduce the difficulty of learning. Region related convolutional neural networks(R-CNN) or its faster variants are mainly used to detect and identify images, followed by regression of bounding box coordinates. However, YOLO (You Look Only Once) directly predicts bounding boxes. However such methods do not perform as well as R-CNN type methods.
2.1 Network algorithm
2.1.1 CNN
Now a days, researchers have focused on convolutional neural network related object detection, classification and segmentation task. Convolutional neural network is important aspect of deep learning model. Now a days CNN is applied to vehicle detection, classification and so on. Hinton et al. [19] proposed a new method based on deep convolutional neural network (DCNN). Such a method significantly improve the efficiency of the network. Mainly sevel convolutional layers are used for classification. Navneet Dalal et al. [20] introduced a new approach that the grids of histograms of oriented gradient (HOG) descriptor gradually outperform the existing feature sets. Also introduced a new approach which gives the separation on the original dataset. Xiang Zhang [21] proposed a new network model for image classification task. The detection model perform different tasks during a common feature extraction task. For R-CNN [22], Selective search strategy is more powerful than the sliding windows for improve the performance in terms of accuracy. Ross Girshick Proposed two different methods (1) Fast R-CNN (2) Faster R-CNN. Fast R-CNN [23] generally used for improve the training and testing speed. Also this method improved the detection efficiency. Faster R-CNN [24] used Region Propose Network model. Such model shares full image convolutional features for object detection. Such model are trained end to end to generate high quality region for object detection Also high detection rate can be achieved by using VGG – 16 model [25]. Qi- Xing Zhang [26] proposed a new approach to avoid the complexity of feature extraction for detection purpose.
Hyeonseob Nam et al. [27] proposed multi domain CNN layers with each domain use one training video sequence , shown as figure 4. During the process of feature extraction, author separates domain independent information from specific layers.
[image: ]
Figure 4 : The structure of MDnet[27]
2.1.2 Recurrent Neural Network
In a recent year, the problem of RNN is to train for visual tracking task. Gan et al. [28] introduced a new method in which target’s image patches are directly applied to the RNN to object specific filter for visual tracking and Kahou et al. [29] presented end to end training to allow spatio temporal pattern to separate training of feature extraction. Although they brought good intuitions from RNN, Such methods do not performed well on modern benchmarks.
Haarnoja et al.[30]presented a new method of RNN to train a Kalman filter for back propagation. Krishnanet al.[31] introduced RNN is use as a inference and generative component in a Kalman filter. D Gordon et al. [32] introduced a new appearance model with a single forward pass. Such model is capable of achieving higher frame rate. Also such model handles occlusion very effectively.
2.1.3 Other
Apart from CNN and RNN, Some researchers has been proposed different tracking algorithm using various deep networks. Naiyan Wang et al. [33] proposed robust discriminative deep learning tracker (DLT) by putting more emphasis on effective image representation learned automatically. Zhuang et al. [34] presented a combination of discriminative classifier and generative model for feature representation using shallow and deep architecture. This method performed better than the DLT algorithm[35] .
3. Demonstrations and analysis
In this section, we discuss the evaluation of different datasets of OTB-50 [36], OTB-100 [37], and VOT-2015 [38]. Which gives the advantages of utilizing deep learning in visual tracking. Finally we discuss on results and conclusion.
3.1 Evaluation benchmarks
This subsection introduce the accepted dataset OTB-50, OTB-100, VOT 2015 along with different attributes. The comparison between adopted dataset of OTB-50, OTB-100, VOT 2015 is as shown in table 1.
Dataset
No of Sequence
Bias
Attributes
OTB-50
50
No reset measure bias estimator
· Illumination Variation (IV)",
· Out of View (OV)",
· Low Resolution (LR)",
· Deformation (DEF)",
· Occlusion (OCC)",
· Fast Motion (FM)",
· Background Clutter (BC)",
· Motion Blur (MB)",
· In-Plane Rotation (IPR)",
· Scale Variation (SV)",
· Out of Plane Rotation (OPR)
OTB-100
100
VOT-2015
60
Reduces the bias
Table 1: Evaluation of Dataset
Figure 5 shows the precision and overlap plots over the 50 benchmark sequences in OTB-50 datasets, while figure 5 shows the precision and overlap plots over 100 benchmark sequences in OTB-100 datasets with different trackers. The consider point of evaluation of such datasets are one pass evaluation. The performance score of precision plot is at error threshold of 20 pixels, while the succees score is the AUC value.
[image: ]
Figure 5: Precision and success rate over all 50 sequences using one-pass evaluation on the OTB Dataset.[47]
Tracking Algorithm : We evaluate different modern trackers, including HDT [39], ADNet [40], TSTM [41], MEEM [42], MDNet [43], DSST [44], ECO [45], TCNN [46], RTT [47], FCNT [48], SMT [49] and SANet [50], DNT[51]. All such methods are evaluated by the precision and success rate. All these methods have achieved good results on the above mention datasets.
[image: ]
Figure 6 : The robustness and accuracy graph of different trackers for the VOT 2015 dataset.[47]
Discussion: The characteristics of different network structure are different as the network changes. Before applying the CNN methods, some trackers used auto encoder or simply constructing network models. However such trackers have achieved similar results with highest performed tracking methods. To develop an appearance model, most of the trackers used DCNN. Table 2 shows the precision rate of different trackers with different attributes for OTB-100 datasets. It is clearly seen that SANet tracker performed better than other trackers in terms of Illumination Variation (IV), Out of view (OV), In Plane Rotation, Occlusion (OCC), Out of Plane Rotation(OPR), Deformation (DEF), Table 3 shows the success score of different trackers with different attributes for OTB -100 datasets. While for the success rate MDNet performed good in various attributes of Out of view (OV), Illumination Variation (IV), Scale Variation (SV), Deformation (DEF), Motion Blur (MB) and In Plane Rotation (IPR). But such tracker is failed under occlusion. Table 4 indicates the baseline evaluation of different trackers with VOT 2015 datasets. Struck [51] and DFT [52] had better accuracy and robustness rank. The only reason is that the deep features have great ability to represent stronger features. Hand crafted features refer to the properties derived by different algorithms using the information present in the image itself. Deep features required sequence to sequence training to extract features effectively. However CNN based methods significantly improved their performance in terms of object detection and tracking but it is fail to give constant information. Figure 6 shows the leading trackers under the baseline evaluation of the robustness accuracy in the dataset of VOT 2015. The better trackers results are located at the upper right corner. The RNN are that kind of neural network which specifically to handle sequential information. Some methods use for represent time oriented relationship like ROLO [53] and RNNT [54] . But such methods cann’t achieved satisfactory performance.
IV
OV
SV
DEF
OCC
FM
MB
IPR
OPR
BC
LR
HDT [55]
79.9
66.1
80.7
80.3
75.7
79.5
76
82.9
79.2
84.4
88.1
ADNet [55]
89.1
79.9
86.6
84.2
82.4
82.4
82.8
85.9
86.8
91.3
92
MDNet [55]
91.8
79.9
89.2
89.4
85.1
87.9
86.8
90.4
89.9
91.7
93.7
TSTN[41]
83.2
72.5
78.7
78.6
77.9
78.9
80.9
82.5
80.7
79.3
84.8
CNN-SVM[41]
79.5
65
77.4
79.3
73
74.7
75.1
81.3
79.8
77.6
92.5
MEEM[41]
74
68.5
73.6
75.4
74.1
75.2
73.1
79.4
79.4
74.6
80.6
ECO [40]
91.4
91.3
88.1
85.9
90.8
86.5
90.4
89.2
90.7
94.2
88.8
SANet [40]
92.6
79
89.1
89.9
86.6
85.3
85.8
90.3
90.6
93.1
88.2
TCNN [40]
92
77.2
87
84.8
83.1
84.3
86.9
89.5
88
87.8
89
Table 2 Precision Rate for OTB - 100 datasets
IV
OV
SV
DEF
OCC
FM
MB
IPR
OPR
BC
LR
HDT [55]
52.8
47.7
48.8
53.8
82.3
56.2
56.5
55.1
53.1
58.5
39.9
ADNet [55]
66.1
60.7
63.8
62.2
61.6
64.1
66.3
62.5
63.5
66.2
56.7
MDNet [55]
69.5
61.7
66.3
65.6
65
67.5
68.2
66.2
66.6
67.6
64.3
TSTN[41]
57.9
53.5
49.6
53.3
54.2
57
60.2
56.9
56.3
56
40.6
CNN-SVM[41]
53.7
48.8
49
54.7
51.5
54.6
57.8
54.8
54.8
54.8
37.9
MEEM[41]
51.7
48.8
46.5
48.9
50.4
54.2
55.6
52.9
53.1
51.9
38.2
ECO [40]
71.3
66
66.9
63.3
68
67.8
71.8
65.5
67.3
70
61.7
SANet [40]
67.7
60
64
63
63.5
64.2
66.3
63.9
64.9
66.9
59.2
TCNN [40]
67.8
58.3
64.1
61.5
62.1
64.8
68.1
64.5
64
65.2
61
Table 3 Success Rate for OTB -100 datasets
Tracking speed is also an important aspect during the process of tracking. Many computer vision task use deep features which is to be very effective at extracting semantic features and classify the object in various category. It is not necessary that deeper network has batter accuracy and speed. Table 4 indicate the baseline evaluation of leading trackers in the VOT 2015 datasets.
Tracking can be evolved by accuracy and robustness rank as shown in table 4. The accuracy indicates how efficiently the bounding box anticipated by the tracker overlaps with ground truth bounding box. While robustness indicates how many times the tracker losses the target tracking path during tracking.
Acc. Rank
Rob. Rank
Expected Overlap
Struck [47]
8.70
8.60
0.4
DFT [56]
9.91
8.79
0.39
DeepSRDCF [47]
2.73
4.23
0.31
EBT [47]
7.35
3.8
0.3
MEEM [56]
6.79
5.61
0.22
KCF [56]
3.16
3.89
0.19
Table 4 Baseline evaluation of VOT 2015
4. Summary of neural network based tracking
No.
Parameter
Summary
1
Speed
· Maximum 100 frame per second (FPS) achieved to track generic object. [57]
· Updating correlation filter with extracting features is computationally expensive and it’s real time performance is poor[58]
2
Occlusion
· Deep learning methods outperform particle filtering under occlusion. [59]
· During the object under occlusion ROLO tracker successfully track the object.[60]
3
Scale Variation
· Scale estimation is not provide the location for a target object which may experience drastic scale variation. [61]
5. Conclusion and Future scope
In this paper, we discuss the different tracking methods based on deep learning and give comparison of traditional base line algorithm .This paper is mainly divided into three section. First we discuss different deep learning based visual tracker. Second compare the various deep trackers on popular benchmark. Third we analyse the different trackers result and conclusion. Deep learning is mainly used for image classification, detection and visual tracking and give effective results to compare with traditional tracker. Most of the trackers use the model of VGG-M. where M is number of convolutional layers. Based on the observation we conclude that : (1) Deep neural network do not require hand crafted features. (2) Deep learning has powerful capacity of feature expressions and has automatic feature extraction over large data sets. (3) Deep neural networks has the property to obtain higher level features by composing lower-level features. Deep learning starts late and develops slow, it is eye catching in terms of the accuracy and overall performance. But it is still not so perfect like drifting and not real time etc. so in conclude improving tracking accuracy and reducing size of data will be new direction.
References
[1] X. Wang, Z. Hou, W. Yu, L. Pu, Z Jin, X. Qin, “Robust occlusion aware part based visual tracking with object scale adaption”, Pattern Recognition, Elsevier, 2018.
[2] H. Hu, B. Ma, J. Shen, L. Shao, “Manifold regularized correlation object tracking”, IEEE transaction on neural network and learning system, 2017.
[3] Z. Zhao, P. Zheng, S. Xu, X. Wu, “Object detection with deep learning : A review”, Vol. 14, No. 8, IEEE, 2017.
[4] M. Leung, A. Delong, B. Alipanahi, B. Frey, “Machine learning in genornic medicine : A review of computational problems and datasets”, Vol. 104, pages 176 – 197, IEEE, 2016.
[5] D. Wang, J. Chen, “Supervised speech separation based on deep learning : An overview”, Vol. 26, pages 1702 – 1726, IEEE transaction on audio, speech and language processing, 2018.
[6] S. Yang, M. Baum, “Extended kalman filter for extended object tracking”, pages 4386 – 4390, IEEE, 2017.
[7] B. Maras, N. Arica, A. Ertuzun, “Object tracking by combining tracking by detection and marginal particle filter”, pages 1029 – 1032, IEEE, 2016.
[8] N. Wang, W. Zhou, H. Li, “Robust object tracking via part - based correlation particle filter”, Pages 1 -6, IEEE, 2018.
[9] X. Xing, F. Qiu, X. Xu, C. Qing, “A new real time robust object tracking method”, pages 1126 – 1129, IEEE, 2015.
[10] L. Wancun, T. Wenyan, Z. Liguo, Z. Xiaolin, L. Jiafu, “Multi – scale behavior learning for multi object”, pages 1 – 5, IEEE, 2017.
[11] D. Riahi, G. Bilodeau, “Multiple object tracking based on sparse generative appearance modelling”, pages 4017 – 4021, IEEE, 2015.
[12] R. Hess, A. Fern, “Discriminatively trained particle filters for complex multi object tracking”, pages 240 – 247, IEEE, 2015.
[13] Q. Yu, T. Dinh, G. Medioni, “Online tracking and reacquisition using co-trained generative and discriminative trackers”, European conference on computer vision, Springer, 2008.
[14] Q. Wang, Q. Shi, X. Tian, “Tracking non-rigid object using discriminative features”, pages 260 – 263, IEEE, 2014.
[15] X. Wang, S.Lu, “Improved fussy multicategory support vector machines classifier”, pages 3585 – 3589, IEEE, 2006
[16] B. Ma, H. Hu, J. Shen, Y. Zhang, L. Shao, F. Porikli, “Robust object tracking by non linear learning”, Vol. 29, Issue : 10, pages 4769 – 4781, IEEE transaction on neural networks and learning systems, 2018.
[17] Y. Li, J Zhu, “A scale adaptive kernel correlation filter tracker with feature integration”, European conference on computer vision, springer, 2014.
[18] J. Masci, U. Meier, D. Ciresan, J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction, ICANN, pages 52 – 59, springer, 2011.
[19] A. Krizhevsky, I. Sutskever, and G. Hinton. "Imagenet classification with deep convolutional neural networks."ANIPS, pages 1097-1105, 2012.
[20] N. Dalal, B. Triggs, "Histograms of oriented gradients for human detection.”, Vol. 1, pages 886-893, IEEE, 2005.
[21] P Sermanet., D Eigen., X Zhang., M. Mathieu, R. Fergus and Y Lecun., "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv : 1312.6229, 2013.
[22] D. Kingma and J. Ba, “Very deep convolutional networks for large scale image recognition”. ICLR, 2015.
[23] R. Girshick, "Fast R-CNN"," IEEE International Conference, pages 1440-1448, 2015.
[24] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks.", ANIPS, pages 91-99, 2015.
[25] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large scale image recognition.", pages 1409 – 1556, arXiv preprint arXiv, 2014.
[26] H. Galoogahi, A Fagg, S. Lucey, “Learning background aware correlation filters for visual tracking”, pages 1-9, 2017.
[27] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking",”, arXiv preprint arXiv, 2015.
[28] Q. Gan, Q. Guo, Z. Zhang, and K. Cho, “First step toward model-free, anonymous object tracking with recurrent neural networks”, 2015, arXiv preprint arXiv.
[29] S. Kahou, V. Michalski, and R. Memisevic, “Ratm: Recurrent attentive tracking model” , 2015 arXiv preprint arXiv",
[30] T. Haarnoja, A. Ajay, S. Levine, P. Abbeel, “Backprop KF: Learning discriminative deterministic state estimators”.NIPS, 2016.
[31] R. Krishnan, U. Shalit, D. Sontag, “Deep kalman filters”. arXiv - 1511.05121, 2015.
[32] D. Held, S. Thrun, S. Savarese, “ Learning to track at 100fps with deep regression networks”, 978-3-319-46448-0, pages 749-765, Springer, 2016
[33] N. Wang, D. Yeung, “Learning a deep compact image representation for visual tracking”, pages 809-817, Advances in neural information processing systems, 2013.
[34] B. Zhuang , L. Wang , H. Lu , “Visual tracking via shallow and deep collaborative model”, pages 61–71, IEEE, 2016.
[35] N. Wang , D. Yeung , “Learning a deep compact image representation for visual tracking”, pages 809–817, IEEE, 2013.
[36] Y. Wu, J. Lim, M. Yang, “Online object tracking: A benchmark",” IEEE, pages 2411–2418, CVPR , 2013.
[37] Y. Wu, J. Lim, M. Yang, “Object tracking benchmark",” TPAMI, pages 1834–1848, vol. 37, no. 9, IEEE, 2015.
[38] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, and R. Pflugfelder, “The visual object tracking vot2015 challenge results",” ECCV, 2015.
[39] Z. Zhou, Z. Chen, “Hybrid decision tree”, Vol. 15, No. 8, pages 515 – 528, Elsevier, 2002.
[40] P. Li, D. Wang, L. Wang, H. Lu, “Deep visual tracking : Review and experimental comparison”, Pattern recognition, Elsevier, 2018.
[41] G. Xu, S. Khan, H. Zhu, L. Han, H. Yan, “Discriminative tracking via supervised tensor learning”, Neurocomputing, Elsevier, 2018.
[42] Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, M. Yang, “ Hedged deep tracking”, pages 4303 – 4311, CVPR, IEEE, 2016.
[43] L. Bertinetto, J. Valmadre, J. Henriques, “Fully convolutional Siamese networks for object tracking”, ECCV, pages 850 – 865, Springer, 2016.
[44] L. Bertinetto, J. Henriques, J. Valmadre, “Learning feed-forward one shot learners”, pages 1 – 9, NIPS, 2016.
[45] J. Xie, R. Girshick, A. Farhadi, “Unsupervised deep embedding for clustering analysis”, International conference on machine learning, Vol. 48, 2016.
[46] H. Zheng, L. Fang, M. Ji, M. Strese, Y. Ozer, E. Steinbach, “Deep learning for surface material classification using haptic and visual information”, IEEE, 2016.
[47] Z. Chi, H. Li, H. Lu, M. Yang, “Dual deep network for visual tracking”, IEEE transaction on image processing, 2016.
[48] M. Zhai, M. Roshtkhari, G. Mori, “Deep learning of appearance models for online object tracking”, arXiv : 1607.02568, 2016.
[49] J. Li, W. Monroe, A. Ritter, M. Galley, “Deep reinforcement learning for dialogue generation”, arXiv : 1606.01541, 2016.
[50] H. Fan, H. Ling, “SA-Net : structure aware network for visual tracking”, CVPRW, ISSN : 2160-7516, pages : 2217 – 2224, 2014.
[51] D. Yi, Z. Lei, S. Liao, S. Li, “Learning face represent from scratch”, arXiv : 1411.7923, 2014.
[52] L. Deng, D. Yu, “Deep learning : Methods and applications”, Vol. 7, No. 3 – 4, 2014.
[53] Y. Zou, J. Li, X. Chen, R. Lan, “ Learning Siamese networks for laser vision seam tracking”, Vol. 35, Issue 11, pages 1805 – 1813, 2018.
[54] W Gan, M Lee, C Wu, “Online object tracking via motion guided convolutional neural network”, Journal of visual communication and image representation, 2018.
[55] F. Zhao, T Zhang, Y. Wu, J. Wang, M. Tang, “Domain adaption tracker with global and local searching”, Vol. 4, IEEE, 2016.
[56] M. Danelljan, G. Hager, “Convolutional features for correlation filter based visual tracking”, Computer vision laboratory, pages 58-66, IEEE, 2016.
[57] D. Held, S. Thrun, S. Savarese, “Learning to track at 100fps with deep regression networks”, 978-3-319-46448-0, pages 749-765, Springer, 2016.
[58] H. Galoogahi, A. Fagg, S. Lucey, “Learning background aware correlation filters for visual tracking”, pages 1-9, 2017.
[59] C. Ozer, F. Gurkan, “Object tracking by deep object detectors and particle filtering”, 976-1-5386-1501-0, IEEE, 2018.
[60] G. Ning, C. Huang, X. Ren, C. Cai, Z. He, “ Spatially supervised recurrent convolutional neural networks for visual object tracking”, ISSN : 2379 – 447x, IEEE, 2017.
[61] C. Sun, D. Wang, H Lu, M Yang, “ Learning spartial aware regression for visual tracking”, pages 8962-8970, IEEE, 2018.