SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation

Posted on 21/04/2019, in Paper.
  • Overview: This paper proposes a sophisticated way to augment video tracking data by a) generate positive samples using a VAE; b) Occlude adversarially via RL; and improved the performance of SINT.
  • Dense sampling: Dense sampling strategy is a common data augmenting approach. For each frame, it will extract positive samples using the IOU of ground truth bbox. However, those generated samples are less diverse. Occlusions and deformations are following a long-tail distribution.
  • SINT: Siamese INstance search Tracker (SINT) simply matches the initial target in the first frame with the candidates in a new frame and returns the most similar one by the learnt matching function.
  • PSGN: Positive sample generation network (PSGN) will generate diverse similar positive sample via a VAE. The VAE is trained separately for each video.
  • HPTN: Hard positive transformation network (HPTN) is a RL network which has 8 movement actions and a terminating action that can be applied to the mask. The state is the VGG decipher of the current mask. The reward is if the score from a pertained SINT will decrease. I am not sure how the initial mask is extracted from this paper; From figure 1 it seems to be a separated process
  • Result: The methods help SINT improved over dataset OTB and VOT — it is not helping it achieving SOTA but the ablation shows both PSGN and HPTN is helpful. The author claims this is a more effective data augmentation method as the bottom line.

Website: SINT++

  • X. Wang, A. Shrivastava, and A. Gupta. A-fast-rcnn: Hard positive generation via adversary for object de- tection. 2017.
  • R. Tao, E. Gavves, and A. W. Smeulders. Siamese in- stance search for tracking. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 1420–1429, 2016