SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Hu WM, Wang Q, Gao J et al. DCFNet: Discriminant correlation filters network for visual tracking. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(3): 691−714 May 2024. DOI: 10.1007/s11390-023-3788-3. |
CNN (convolutional neural network) based real time trackers usually do not carry out online network update in order to maintain rapid tracking speed. This inevitably influences the adaptability to changes in object appearance. Correlation filter based trackers can update the model parameters online in real time. In this paper, we present an end-to-end lightweight network architecture, namely Discriminant Correlation Filter Network (DCFNet). A differentiable DCF (discriminant correlation filter) layer is incorporated into a Siamese network architecture in order to learn the convolutional features and the correlation filter simultaneously. The correlation filter can be efficiently updated online. In previous work, we introduced a joint scale-position space to the DCFNet, forming a scale DCFNet which carries out the predictions of object scale and position simultaneously. We combine the scale DCFNet with the convolutional-deconvolutional network, learning both the high-level embedding space representations and the low-level fine-grained representations for images. The adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding are complementary for visual tracking. The back-propagation is derived in the Fourier frequency domain throughout the entire work, preserving the efficiency of the DCF. Extensive evaluations on the OTB (Object Tracking Benchmark) and VOT (Visual Object Tracking Challenge) datasets demonstrate that the proposed trackers have fast speeds, while maintaining tracking accuracy.
[1] |
Wu Y, Lim J, Yang M H. Online object tracking: A benchmark. In Proc. the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp.2411–2418. DOI: 10.1109/CVPR.2013.312.
|
[2] |
Wu Y, Lim J, Yang M H. Object tracking benchmark. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834–1848. DOI: 10.1109/TPAMI.2014.2388 226.
|
[3] |
Kristan M, Matas J, Leonardis A, Felsberg M. The visual object tracking VOT2015 challenge results. In Proc. the 2015 IEEE International Conference on Computer Vision Workshop, Dec. 2015, pp.564–586. DOI: 10.1109/ICCVW.2015.79.
|
[4] |
Tan K, Wei Z Z. Learning an orientation and scale adaptive tracker with regularized correlation filters. IEEE Access, 2019, 7: 53476–53486. DOI: 10.1109/ACCESS.2019.2912527.
|
[5] |
Wu Q Q, Yan Y, Liang Y J, Liu Y, Wang H Z. DSNet: Deep and shallow feature learning for efficient visual tracking. In Proc. the 14th Asian Conference on Computer Vision, Dec. 2018, pp.119–134. DOI: 10.1007/978-3-030-20873-8_8.
|
[6] |
Zhong Z, Yang Z C, Feng W T, Wu W, Hu Y Y, Liu C L. Decision controller for object tracking with deep reinforcement learning. IEEE Access, 2019, 7: 28069–28079. DOI: 10.1109/ACCESS.2019.2900476.
|
[7] |
Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422. DOI: 10.1109/TPAMI.2011. 239.
|
[8] |
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M M, Hicks S L, Torr P H S. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096–2109. DOI: 10.1109/TPAMI.2015.2509974.
|
[9] |
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. DOI: 10.1109/TPAMI.2014.2345390.
|
[10] |
Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Dec. 2016, pp.4293–4302. DOI: 10.1109/CVPR.2016.465.
|
[11] |
Li Y, Zhu J K. A scale adaptive kernel correlation filter tracker with feature integration. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.254–265. DOI: 10.1007/978-3-319-16181-5_18.
|
[12] |
Kalal Z, Matas J, Mikolajczyk K. P-N learning: Bootstrapping binary classifiers by structural constraints. In Proc. the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.49–56. DOI: 10.1109/CVPR.2010.5540231.
|
[13] |
Bolme D S, Beveridge J R, Draper B A, Lui Y M. Visual object tracking using adaptive correlation filters. In Proc. the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.2544–2550. DOI: 10.1109/CVPR.2010.5539960.
|
[14] |
Danelljan M, Häger G, Khan FS, Felsberg M. Accurate scale estimation for robust visual tracking. In Proc. the 2014 British Machine Vision Conference, Sept. 2014. DOI: 10.5244/c.28.65.
|
[15] |
Wang N Y, Shi J P, Yeung D Y, Jia J Y. Understanding and diagnosing visual tracking systems. In Proc. the IEEE International Conference on Computer Vision, Dec. 2015, pp.3101–3109. DOI: 10.1109/ICCV.2015.355.
|
[16] |
Ma C, Huang J B, Yang X K, Yang M H. Hierarchical convolutional features for visual tracking. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.3074–3082. DOI: 10.1109/ICCV.2015.352.
|
[17] |
Danelljan M, Häger G, Khan F S, Felsberg M. Convolutional features for correlation filter based visual tracking. In Proc. the 2015 IEEE International Conference on Computer Vision Workshop, Dec. 2015, pp.621–629. DOI: 10.1109/ICCVW.2015.84.
|
[18] |
Qi Y K, Zhang S P, Qin L, Yao H X, Huang Q M, Lim J, Yang M H. Hedged deep tracking. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.4303–4311. DOI: 10.1109/CVPR.2016.466.
|
[19] |
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P H S. Staple: Complementary learners for real-time tracking. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.1401–1409. DOI: 10.1109/CVPR.2016.156.
|
[20] |
Danelljan M, Häger G, Khan F S, Felsberg M. Learning spatially regularized correlation filters for visual tracking. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.4310–4318. DOI: 10.1109/ICCV.2015.490.
|
[21] |
Danelljan M, Bhat G, Khan F S, Felsberg M. ECO: Efficient convolution operators for tracking. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6931–6939. DOI: 10.1109/CVPR.2017.733.
|
[22] |
Henriques J F, Caseiro R, Martins P, Batista J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proc. the 12th European Conference on Computer Vision, Oct. 2012, pp.702–715. DOI: 10.1007/978-3-642-33765-9_50.
|
[23] |
Danelljan M, Khan F S, Felsberg M, Van De Weijer J. Adaptive color attributes for real-time visual tracking. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.1090–1097. DOI: 10.1109/CVPR.2014.143.
|
[24] |
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
|
[25] |
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp.580–587. DOI: 10.1109/CVPR.2014.81.
|
[26] |
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot MultiBox detector. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.21–37. DOI: 10.1007/978-3-319-46448-0_2.
|
[27] |
Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.1440–1448. DOI: 10.1109/ICCV.2015.169.
|
[28] |
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.3431–3440. DOI: 10.1109/CVPR.2015.7298965.
|
[29] |
Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.597–606. DOI: 10.5555/3045118.3045183.
|
[30] |
Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.809–817. DOI: 10.5555/2999611.2999702.
|
[31] |
Wang L J, Ouyang W L, Wang X G, Lu H C. Visual tracking with fully convolutional networks. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.3119–3127. DOI: 10.1109/ICCV.2015.357.
|
[32] |
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H S. Fully-convolutional Siamese networks for object tracking. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.850–865. DOI: 10.1007/978-3-319-48881-3_56.
|
[33] |
Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.1420–1429. DOI: 10.1109/CVPR.2016.158.
|
[34] |
Held D, Thrun S, Savarese S. Learning to track at 100 FPS with deep regression networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.749–765. DOI: 10.1007/978-3-319-46448-0_45.
|
[35] |
Yan B, Wang D, Lu H C, Yang X Y. Cooling-shrinking attack: Blinding the tracker with imperceptible noises. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.987–996. DOI: 10.1109/CVPR42600.2020.00107.
|
[36] |
Guo Q, Xie X F, Juefei-Xu F, Ma L, Li Z G, Xue W L, Feng W, Liu Y. SPARK: Spatial-aware online incremental attack against visual tracking. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.202–219. DOI: 10.1007/978-3-030-58595-2_13.
|
[37] |
Jia S, Ma C, Song Y B, Yang X K. Robust tracking against adversarial attacks. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.69–84. DOI: 10.1007/978-3-030-58529-7_5.
|
[38] |
Liang S Y, Wei X X, Yao S Y, Cao X C. Efficient adversarial attacks for visual object tracking. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.34–50. DOI: 10.1007/978-3-030-58574-7_3.
|
[39] |
Nakka K K, Salzmann M. Temporally-transferable perturbations: Efficient, one-shot adversarial attacks for online visual object trackers. arXiv: 2012.15183, 2020.https://arxiv.org/abs/2012.15183, Jan. 2024.
|
[40] |
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr P H S. End-to-end representation learning for correlation filter based tracking. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.5000–5008. DOI: 10.1109/CVPR.2017.531.
|
[41] |
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252. DOI: 10.1007/s11263-015-0816-y.
|
[42] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th International Conference on Neural Information Processing Systems, Dec. 2012, pp.1097–1105. DOI: 10.5555/2999134.2999257.
|
[43] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2014.
|
[44] |
Gao J, Wang Q, Xing J L, Ling H B, Hu W M, Maybank S. Tracking-by-fusion via Gaussian process regression extended to transfer learning. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(4): 939–955. DOI: 10.1109/TPAMI.2018.2889070.
|
[45] |
Zhang J M, Ma S G, Sclaroff S. MEEM: Robust tracking via multiple experts using entropy minimization. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.188–203. DOI: 10.1007/978-3-319-10599-4_13.
|
[46] |
Choi J, Chang H J, Yun S, Fischer T, Demiris Y, Choi J Y. Attentional correlation filter network for adaptive visual tracking. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.4828–4837. DOI: 10.1109/CVPR.2017.513.
|
[47] |
Hong Z B, Chen Z, Wang C H, Mei X, Prokhorov D, Tao D C. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.749–758. DOI: 10.1109/CVPR.2015.7298675.
|
[48] |
Ma C, Yang X K, Zhang C Y, Yang M H. Long-term correlation tracking. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.5388–5396. DOI: 10.1109/CVPR.2015.7299177.
|
[49] |
Fan H, Ling H B. Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5487–5495. DOI: 10.1109/ICCV.2017.585.
|
[50] |
Zhang M D, Xing J L, Gao J, Hu W M. Robust visual tracking using joint scale-spatial correlation filters. In Proc. the 2015 IEEE International Conference on Image Processing, Sept. 2015, pp.1468–1472. DOI: 10.1109/ICIP.2015.7351044.
|
[51] |
Mueller M, Smith N, Ghanem B. Context-aware correlation filter tracking. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.1387–1395. DOI: 10.1109/CVPR.2017.152.
|
[52] |
Lukežic A, Vojír T, Zajc L C, Matas J, Kristan M. Discriminative correlation filter with channel and spatial reliability. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.4847–4856. DOI: 10.1109/CVPR.2017.515.
|
[53] |
Galoogahi H K, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1144–1152. DOI: 10.1109/ICCV.2017.129.
|
[54] |
Wang Q, Gao J, Zhang M D, Xing J L, Hut W. SPCNet: Scale position correlation network for end-to-end visual tracking. In Proc. the 24th International Conference on Pattern Recognition, Aug. 2018, pp.1803–1808. DOI: 10.1109/ICPR.2018.8545053.
|
[55] |
Kristan M, Leonardis A, Matas J et al. The visual object tracking VOT2017 challenge results. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1949–1972. DOI: 10.1109/ICCVW.2017.230.
|
[56] |
Goyal P, Dollar P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y Q, He K M. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv: 1706.02677, 2017. https://arxiv.org/abs/1706.02677, Jan. 2024.
|
[57] |
Boeddeker C, Hanebrink P, Drude L, Heymann J, Hab-Umbach R. On the computation of complex-valued gradients with application to statistically optimum beamforming. arXiv: 1701.00392, 2017. https://arxiv.org/abs/1701.00392, Jan. 2024.
|
[58] |
Vedaldi A, Lenc K. MatConvNet: Convolutional neural networks for MATLAB. In Proc. the 23rd ACM International Conference on Multimedia, Oct. 2015, pp.689–692. DOI: 10.1145/2733373.2807412.
|
[59] |
Li A N, Lin M, Wu Y, Yang M H, Yan S C. NUS-PRO: A new visual tracking challenge. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(2): 335–349. DOI: 10.1109/TPAMI.2015.2417577.
|
[60] |
Liang P P, Blasch E, Ling H B. Encoding color information for visual tracking: Algorithms and benchmark. IEEE Trans. Image Processing, 2015, 24(12): 5630–5644. DOI: 10.1109/TIP.2015.2482905.
|
[61] |
Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.445–461. DOI: 10.1007/978-3-319-46448-0_27.
|
[62] |
Danelljan M, Häger G, Khan F S, Felsberg M. Discriminative scale space tracking. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561–1575. DOI: 10.1109/TPAMI.2016.2609928.
|
[63] |
Wang M M, Liu Y, Huang Z Y. Large margin object tracking with circulant feature maps. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.4800–4808. DOI: 10.1109/CVPR.2017.510.
|
[64] |
Gordon D, Farhadi A, Fox D. Re3: Real-time recurrent regression networks for visual tracking of generic objects. IEEE Robotics and Automation Letters, 2018, 3(2): 788–795. DOI: 10.1109/LRA.2018.2792152.
|
[65] |
Zhang T Z, Xu C S, Yang M H. Multi-task correlation particle filter for robust object tracking. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.4819–4827. DOI: 10.1109/CVPR.2017.512.
|
[66] |
Chen X, Kang B, Wang D, Li D D, Lu H C. Efficient visual tracking via hierarchical cross-attention transformer. In Proc. the 17th European Conference on Computer Vision, Oct. 2023, pp.461–477. DOI: 10.1007/978-3-031-25085-9_26.
|
[67] |
Blatter P, Kanakis M, Danelljan M, Van Gool L. Efficient visual tracking with exemplar transformers. In Proc. the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Jan. 2023, pp.1571–1581. DOI: 10.1109/WACV56688.2023.00162.
|
[68] |
Yan B, Peng H W, Wu K, Wang D, Fu J L, Lu H C. LightTrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.15175–15184. DOI: 10.1109/CVPR46437.2021.01493.
|
[69] |
Danelljan M, Bhat G, Khan F S, Felsberg M. ATOM: Accurate tracking by overlap maximization. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4655–4664. DOI: 10.1109/CVPR.2019.00479.
|