Multi-Agent Tensor Fusion for Contextual Trajectory PredictionPosted on 14/05/2019, in Paper.
- Overview: This paper improves the trajectory prediction result by modelling the scene and the past of the agent jointly (i.e. concat the encoded tensor in the middle). This method improves the complexity dependencies on the number of the agent comparing to attention based mechanism.
- Agent-centric methods: To incorporate the interaction of multi-agent, NN based method usually needs some aggregating function on top of the encoded vectors of each agent. This introduce an permutation problem, i.e. the order of the agent matters in the prediction. One we to get around of it is to use attention mechanism whereas the complexity bumps up to $O(n^2)$ with respect to the number of the agents.
- Tensor fusion: The trajectories of the agents are encoded using one LSTM while the scene is encoded by a CNN. The novelty of this paper is the way to fuse the information from both sides: The tensor of the agent of the trajectories is placed on top of the scene tensor based on the location of last timestamp. In this way the static location distribution and scene information are pick up by the U-Net encoder at the same time in the next stage.
- Adversarial Loss: Adversarial loss is used to further improve the result. A single LSTM is used as the discriminator while the generator is a conditional GAN with respect to the encoded vector plus a white noise.
- Result: The method get pretty good result on
Stanford Dronebut the training details are not disclosed in the paper.
One thing I am a bit concerned of this paper is the variation of the scene — as far as I know most of the dataset does not have many scene variation. For example, Stanford Drone dataset only have 8 scene. Hence I am not sure how they trained the scene encoder and how much it helps in the final result. For the field of
trajectory-prediction it would be helpful to take a look at the following literature:
Social lstm: Human trajectory prediction in crowed space
Social GAN: Socially acceptable trajectories with generative adversarial networks.
Social attention: Mod- eling attention in human crowds.
Sophie: An attentive GAN for predicting paths compliant to social and physical constraints