Abstract:
Visual tracking frameworks employing ConvolutionalNeural Networks (CNNs) have shown state-of-the-art performancedue to their hierarchical feature representation. Whileclassification and update based deep neural net tracking haveshown good performance in terms of accuracy, they havepoor tracking speed. On the other hand, recent matchingbased techniques using CNNs show higher than real-timespeed in tracking but this speed is achieved at a considerablylower accuracy. To successfully manage the trade-offbetween accuracy and speed, we propose a novel CNN architecturefor visual tracking. We achieve this trade-off balanceby using an approach in which consecutive similar framesare processed with a similarity matching technique, and dissimilarframes are processed with a classification approachwithin the CNN architecture. The tracking speed is improvedby avoiding unnecessary model updates through the measurementof similarity between adjacent frames, while theaccuracy is maintained by adopting a classification approachwhen needed, with deeper level features. Extensive evaluationperformed on a publicly available benchmark dataset
demonstrates our proposed tracker shows competitive performancewhile maintaining near real-time speed.