Person Search via A Mask-Guided Two-Stream CNN Model

Posted on 23/04/2019, in Paper.
  • Overview: This paper have 3 contributions: a) separating the pedestrian detection and the ReID in the person search task; b) build a model leverage both the foreground and the background; c) quantify the contribution of the foreground and the background
  • Person search: Person search is basically pedestrian detection plussing the re-identification. However, the target is contradicting to each other here: pedestrian detection tends to mine the commonest of different pedestrian while ReID needs to exploit the individual features.
  • F/ONet: Once we separate the task into detection + ReID, a natural engineering problem is, which part shall we send to the ReID module? Shall we include the background or just the foreground, the person? The authors crated the foreground layer by apply pre-trained segmentation model and a majority vote; There will be two neural nets processing the foreground and the original image separately, namely FNet and ONet. The outputs will be combined using a SEBlock (a self-attention module).
  • Result: There are three take-aways from the experiments: a) separating is better than jointly trained the person search model; b) both background and foreground is useful for the task; c) The best region of background to be include is about 130% of the person bbox.

It would be interesting to generated a distribution-like mask from the pedestrian detection and use that to be ReID.. Some models to checkout later:

  • SEBlock: Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
  • FCIS model: Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmen- tation. In: CVPR. (2017)
  • Online Instance Matching (OIM) loss: Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: CVPR. (2017)